**Reiner Hähnle Wil van der Aalst (Eds.)**

# **Fundamental Approaches to Software Engineering**

**22nd International Conference, FASE 2019 Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2019 Prague, Czech Republic, April 6–11, 2019, Proceedings**

# Lecture Notes in Computer Science 11424

Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen

## Editorial Board Members

David Hutchison, UK Josef Kittler, UK Friedemann Mattern, Switzerland Moni Naor, Israel Bernhard Steffen, Germany Doug Tygar, USA

Takeo Kanade, USA Jon M. Kleinberg, USA John C. Mitchell, USA C. Pandu Rangan, India Demetri Terzopoulos, USA

## Advanced Research in Computing and Software Science Subline of Lecture Notes in Computer Science

Subline Series Editors

Giorgio Ausiello, University of Rome 'La Sapienza', Italy Vladimiro Sassone, University of Southampton, UK

Subline Advisory Board

Susanne Albers, TU Munich, Germany Benjamin C. Pierce, University of Pennsylvania, USA Bernhard Steffen, University of Dortmund, Germany Deng Xiaotie, Peking University, Beijing, China Jeannette M. Wing, Microsoft Research, Redmond, WA, USA More information about this series at http://www.springer.com/series/7407

# Fundamental Approaches to Software Engineering

22nd International Conference, FASE 2019 Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2019 Prague, Czech Republic, April 6–11, 2019 Proceedings

Editors Reiner Hähnle Technische Universität Darmstadt Darmstadt, Germany

Wil van der Aalst RWTH Aachen University Aachen, Germany

ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notes in Computer Science ISBN 978-3-030-16721-9 ISBN 978-3-030-16722-6 (eBook) https://doi.org/10.1007/978-3-030-16722-6

Library of Congress Control Number: 2019936008

LNCS Sublibrary: SL1 – Theoretical Computer Science and General Issues

© The Editor(s) (if applicable) and The Author(s) 2019. This book is an open access publication.

Open Access This book is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

## ETAPS Foreword

Welcome to the 22nd ETAPS! This is the first time that ETAPS took place in the Czech Republic in its beautiful capital Prague.

ETAPS 2019 was the 22nd instance of the European Joint Conferences on Theory and Practice of Software. ETAPS is an annual federated conference established in 1998, and consists of five conferences: ESOP, FASE, FoSSaCS, TACAS, and POST. Each conference has its own Program Committee (PC) and its own Steering Committee (SC). The conferences cover various aspects of software systems, ranging from theoretical computer science to foundations to programming language developments, analysis tools, formal approaches to software engineering, and security.

Organizing these conferences in a coherent, highly synchronized conference program enables participation in an exciting event, offering the possibility to meet many researchers working in different directions in the field and to easily attend talks of different conferences. ETAPS 2019 featured a new program item: the Mentoring Workshop. This workshop is intended to help students early in the program with advice on research, career, and life in the fields of computing that are covered by the ETAPS conference. On the weekend before the main conference, numerous satellite workshops took place and attracted many researchers from all over the globe.

ETAPS 2019 received 436 submissions in total, 137 of which were accepted, yielding an overall acceptance rate of 31.4%. I thank all the authors for their interest in ETAPS, all the reviewers for their reviewing efforts, the PC members for their contributions, and in particular the PC (co-)chairs for their hard work in running this entire intensive process. Last but not least, my congratulations to all authors of the accepted papers!

ETAPS 2019 featured the unifying invited speakers Marsha Chechik (University of Toronto) and Kathleen Fisher (Tufts University) and the conference-specific invited speakers (FoSSaCS) Thomas Colcombet (IRIF, France) and (TACAS) Cormac Flanagan (University of California at Santa Cruz). Invited tutorials were provided by Dirk Beyer (Ludwig Maximilian University) on software verification and Cesare Tinelli (University of Iowa) on SMT and its applications. On behalf of the ETAPS 2019 attendants, I thank all the speakers for their inspiring and interesting talks!

ETAPS 2019 took place in Prague, Czech Republic, and was organized by Charles University. Charles University was founded in 1348 and was the first university in Central Europe. It currently hosts more than 50,000 students. ETAPS 2019 was further supported by the following associations and societies: ETAPS e.V., EATCS (European Association for Theoretical Computer Science), EAPLS (European Association for Programming Languages and Systems), and EASST (European Association of Software Science and Technology). The local organization team consisted of Jan Vitek and Jan Kofron (general chairs), Barbora Buhnova, Milan Ceska, Ryan Culpepper, Vojtech Horky, Paley Li, Petr Maj, Artem Pelenitsyn, and David Safranek.

The ETAPS SC consists of an Executive Board, and representatives of the individual ETAPS conferences, as well as representatives of EATCS, EAPLS, and EASST. The Executive Board consists of Gilles Barthe (Madrid), Holger Hermanns (Saarbrücken), Joost-Pieter Katoen (chair, Aachen and Twente), Gerald Lüttgen (Bamberg), Vladimiro Sassone (Southampton), Tarmo Uustalu (Reykjavik and Tallinn), and Lenore Zuck (Chicago). Other members of the SC are: Wil van der Aalst (Aachen), Dirk Beyer (Munich), Mikolaj Bojanczyk (Warsaw), Armin Biere (Linz), Luis Caires (Lisbon), Jordi Cabot (Barcelona), Jean Goubault-Larrecq (Cachan), Jurriaan Hage (Utrecht), Rainer Hähnle (Darmstadt), Reiko Heckel (Leicester), Panagiotis Katsaros (Thessaloniki), Barbara König (Duisburg), Kim G. Larsen (Aalborg), Matteo Maffei (Vienna), Tiziana Margaria (Limerick), Peter Müller (Zurich), Flemming Nielson (Copenhagen), Catuscia Palamidessi (Palaiseau), Dave Parker (Birmingham), Andrew M. Pitts (Cambridge), Dave Sands (Gothenburg), Don Sannella (Edinburgh), Alex Simpson (Ljubljana), Gabriele Taentzer (Marburg), Peter Thiemann (Freiburg), Jan Vitek (Prague), Tomas Vojnar (Brno), Heike Wehrheim (Paderborn), Anton Wijs (Eindhoven), and Lijun Zhang (Beijing).

I would like to take this opportunity to thank all speakers, attendants, organizers of the satellite workshops, and Springer for their support. I hope you all enjoy the proceedings of ETAPS 2019. Finally, a big thanks to Jan and Jan and their local organization team for all their enormous efforts enabling a fantastic ETAPS in Prague!

February 2019 Joost-Pieter Katoen ETAPS SC Chair ETAPS e.V. President

## Preface

This volume contains the papers presented at the 22nd International Conference on Fundamental Approaches to Software Engineering (FASE 2019) held during April 9–11, 2019, in Prague. FASE 2019 was organized as part of the annual European Joint Conferences on Theory and Practice of Software (ETAPS 2019). ETAPS is the most important and visible annual European event related to software sciences.

As usual, the papers submitted to FASE focus on the foundations on which software engineering is built. The papers submitted covered topics such as software engineering, requirements engineering, software architectures, specification, software quality, validation, verification of functional and non-functional properties, model-driven development and model transformation, model transformations, software processes, and software evolution.

We received 94 abstract submissions of which 74 were turned into full submissions (63 research papers, five tool papers, and six demo papers). We had submissions from the following countries (sorted based on the number of submissions): Germany, France, Canada, Estonia, USA, Argentina, UK, Norway, Spain, Brazil, China, South Korea, Australia, Czechia, Austria, Denmark, Italy, Japan, the Netherlands, Pakistan, South Africa, Tunisia, India, Poland, Portugal, Romania, Turkey, Belgium, Colombia, Macedonia, Malta, Sweden, and Ukraine.

Of the 74 submitted papers, 24 papers were accepted after reviewing and discussions among the Program Committee (PC) members (20 research papers, two tool papers, and two demo papers). This corresponds to a 32% acceptance rate. Beside the 30 PC members, there were 100 external reviewers. For the fourth time, FASE used a double-blind reviewing process. Overall the reviewing process was smooth and it was possible to have consensus on all decisions. We thank the PC members and reviewers for doing a great job!

Apart from thanking the authors, we also thank Marsha Chechik (University of Toronto) for contributing a paper based on her plenary ETAPS 2019 invited talk, which is also included in these proceedings. The title of Marsha's talk was "Software Assurance in an Uncertain World." She discussed the problem that software systems are deeply rooted in uncertainty since most complex open-world functionality is either not completely specifiable or it is not cost-effective to do so. Moreover, these systems are placed in an uncertain ever-evolving environment.

This volume shows that, despite the rapid progress in software engineering, there are still many open problems. These problems are important for the way we do business, the way we govern, and the way we socialize. We depend on complex software artifacts, yet we still need to fully understand how to best develop and maintain them. The papers in this volume help to progress the state of the art and hopefully inspire and influence future work.

We thank the ETAPS 2019 organizers, in particular, Jan Kofron and Jan Vitek (general chairs), Barbora Buhnova (publicity chair), Vojtech Horkey and Arten viii Preface

Pelnisyn (web chairs), and David Safranek (publications chair). We also thank Joost-Pieter Katoen, the ETAPS SC chair, for managing the whole process, and Gabriele Taentzer, the FASE SC chair, for swift feedback on several questions.

We hope that you will enjoy reading the volume.

February 2019 Wil van der Aalst Reiner Hähnle

## Organization

## Program Committee

Ludovic Henrio CNRS, France

Maurice H. Ter Beek ISTI-CNR, Pisa, Italy Yingfei Xiong Peking University, China

Christel Baier TU Dresden, Germany Stefano Berardi University of Turin, Italy Mario Bravetti University of Bologna, Italy Jordi Cabot Open University of Catalonia, Spain Ana Cavalcanti University of York, UK Marsha Chechik University of Toronto, Canada Ferruccio Damiani University of Turin, Italy Ewen Denney NASA Ames Research Center, USA Dilian Gurov KTH Royal Institute of Technology, Sweden Reiner Hähnle TU Darmstadt, Germany Gerti Kappel Vienna University of Technology, Austria Ekkart Kindler Technical University of Denmark, Denmark Martin Leucker University of Lübeck, Germany Jun Pang University of Luxembourg, Luxembourg André Platzer Carnegie Mellon University, USA Bernhard Rumpe RWTH Aachen University, Germany Alessandra Russo Imperial College London, UK Rick Salay University of Toronto, Canada Ina Schaefer Technische Universität Braunschweig, Germany Andy Schürr TU Darmstadt, Germany Perdita Stevens The University of Edinburgh, UK Mariëlle Stoelinga University of Twente, The Netherlands Jun Sun Singapore University of Technology and Design, Singapore Gabriele Taentzer Philipps-Universität Marburg, The Netherlands Silvia Lizeth Tapia Tarifa University of Oslo, Norway Wil M. P. van der Aalst RWTH Aachen University, Germany Heike Wehrheim Paderborn University, Germany

## Additional Reviewers

Aspinall, David Bafrani, Mahsa Baxter, James Berti, Alessandro Bettini, Lorenzo Bill, Robert Bozzano, Marco Bubel, Richard Canovas Izquierdo, Javier Luis Cerone, Andrea Chen, Yifan Ciancia, Vincenzo Cordwell, Katherine Dalibor, Manuela Dashevskyi, Stanislav Din, Crystal Chang Drave, Imke Helene Ed-Douibi, Hamza Escobar, Santiago Ferrari, Alessio Fritsche, Lars Fulton, Nathan Gadyatskaya, Olga Gario, Marco Gerhold, Marcus Gerking, Christopher Giannini, Paola Girault, Alain Guanciale, Roberto Gómez, Abel Habermehl, Peter Haglund, Jonas Henderson, Robbie

Herda, Mihai Hillemacher, Steffen Johnsen, Einar Broch Kamburjan, Eduard Kharraz, Karam Knüppel, Alexander Kosiol, Jens König, Jürgen Lange, Felix Dino Laurent, Jonathan Leroy, Dorian Lidström, Christian Lienhardt, Michael Lindner, Andreas Lischke, Sabrina Lochau, Malte Lu, Sirui Luthmann, Lars Martínez, Salvador Mauro, Jacopo Mazzanti, Franco Meijer, Jeroen Mereuta, Radu Michael, Judith Mitsch, Stefan Miyazawa, Alvaro Mover, Sergio Najafzadeh, Mahsa Nassar, Nebras Netz, Lukas Oortwijn, Wytse Palmskog, Karl Paolini, Luca Papadakis, Michail

Papadakis, Mike Pedro, Andre Petrocchi, Marinella Pozzato, Gian Luca Raco, Deni Ren, Luyao Ribeiro, Pedro Ruijters, Enno Ruland, Sebastian Runge, Tobias Schivo, Stefano Schlatte, Rudolf Schlie, Alexander Schmalzing, David Schmitz, Malte Sharma, Arnab Shumeiko, Igor Sogokon, Andrew Spagnolo, Giorgio Oronzo Sproston, Jeremy Steffen, Martin Thoma, Daniel Thüm, Thomas Toews, Manuel Tomaszek, Stefan Tveito, Lars Wally, Bernhard Wang, Bo Wang, Guancheng Zacchiroli, Stefano Zawadzki, Erik Zhang, Yuhao Zhu, Qihao

## Contents

#### FASE Invited Talk


#### Software Verification II





## Software Testing


# FASE Invited Talk

# **Software Assurance in an Uncertain World**

Marsha Chechik(B) , Rick Salay, Torin Viger, Sahar Kokaly, and Mona Rahimi

> University of Toronto, Toronto, Canada chechik@cs.toronto.edu

**Abstract.** From financial services platforms to social networks to vehicle control, software has come to mediate many activities of daily life. Governing bodies and standards organizations have responded to this trend by creating regulations and standards to address issues such as safety, security and privacy. In this environment, the compliance of software development to standards and regulations has emerged as a key requirement. Compliance claims and arguments are often captured in assurance cases, with linked evidence of compliance. Evidence can come from testcases, verification proofs, human judgment, or a combination of these. That is, experts try to build (safety-critical) systems carefully according to well justified methods and articulate these justifications in an assurance case that is ultimately judged by a human. Yet software is deeply rooted in uncertainty; most complex open-world functionality (e.g., perception of the state of the world by a self-driving vehicle), is either not completely specifiable or it is not cost-effective to do so; software systems are often to be placed into uncertain environments, and there can be uncertainties that need to be We argue that the role of assurance cases is to be the grand unifier for software development, focusing on capturing and managing uncertainty. We discuss three approaches for arguing about safety and security of software under uncertainty, in the absence of fully sound and complete methods: assurance argument rigor, semantic evidence composition and applicability to new kinds of systems, specifically those relying on ML.

## **1 Introduction**

From financial services platforms to social networks to vehicle control, software has come to mediate many activities of daily life. Governing bodies and standards organizations have responded to this trend by creating regulations and standards to address issues such as safety, security and privacy. In this environment, the compliance of software development to standards and regulations has emerged as a key requirement.

Development of safety-critical systems begins with *hazard analysis*, aimed to identify possible causes of harm. It uses severity, probability and controllability of a hazard's occurrence to assign the Safety Integrity Levels (in the automotive industry, these are referred to as ASILs [35]) – the higher the ASIL level, the more rigor is expected to be put into identifying and mitigating the hazard. Mitigating hazards therefore becomes the main requirement of the system, with system safety requirements being directly linked to the hazards. These requirements are then refined along the LHS of the V until individual modules and their implementation can be built. The RHS includes appropriate testing and validation, used as supporting evidence in developing an argument that the system adequately handles its hazards, with the expectation that the higher the ASIL level, the stronger the required justification of safety is.

Assurance claims and arguments are often captured by *assurance cases*, with linked evidence supporting it. Evidence can come from testcases, verification proofs, human judgment, or a combination of these. Assurance cases organize information allowing argument unfolding in a comprehensive way and ultimately allowing safety engineers to determine whether they trust that the system was adequately designed to avoid systematic faults (before delivery) and adequately detect and react to failures at runtime [35].

Yet software is deeply rooted in uncertainty; most complex open-world functionality (e.g., perception of the state of the world by a self-driving vehicle), is either not completely specifiableor it is not cost-effective to do so [12]. Software systems are often to be placed into uncertain environments [48], and there can be uncertainties that need to be considered at the design phase [20]. Thus, we believe that the role of assurance cases is to *explicitly capture and manage uncertainty coming from different sources, assess it and ultimately reduce it to an acceptable level, either with respect to a standard, company processes, or assessor judgment*. The various software development steps are currently not well integrated, and uncertainty is not expressed or managed explicitly in a uniform manner. Our claim in this paper is that *an assurance case is the unifier among the different software development steps, and can be used to make uncertainties explicit, which also makes them manageable. This provides a well-founded basis for modeling confidence about satisfaction of a critical system quality (security, safety, etc.) in an assurance case, making assurance cases play a crucial role in software development*. Specifically, we enumerate sources of uncertainty in software development. We also argue that organizing software development and analysis activities around the assurance case as a *living document* allows all parts of the software development to explicitly articulate uncertainty, steps taken to manage it, and the degree of confidence that artifacts acting as evidence have been performed correctly. This information can then help potential assessors in checking that the development outcome adequately satisfies the software desired quality (e.g., safety).

The area of system dependability has produced a significant body of work describing how to model assurance cases (e.g., [4,5,14,38]), and how to assess reviewer's confidence in the argument being made (e.g., [16,31,45,59,60]). There is also early work on assessing the impact of change on the assurance argument when the system undergoes change [39]. A recent survey [43] provides a comprehensive list of assurance case tools developed over the past 20 years and an analysis of their functionalities including support for assurance case creation, assessment and maintenance. We believe that the road to truly making assurance cases the grand unifier for software development for complex high-assurance systems has many challenges. One is to be able to successfully argue about safety and security of software under uncertainty, without fully sound and complete methods. For that, we believe that *assurance arguments must be rigorous* and that we need to properly understand how to perform *evidence composition* for traditional systems, but also for *new kinds of systems*, specifically those relying on ML. We discuss these issues below.

**Rigor.** To be validated or reused, assurance case structures must be as rigorous as possible [51]. Of course, assurance arguments ultimately depend on human judgment (with some facts treated as "obvious" and "generally acceptable"), but the structure of the argument should be fully formal so as to allow to assess its completeness. Bandur and McDermid called this approach "formal modulo engineering expertise" [1].

**Evidence Composition.** We need to effectively combine the top-down process of uncertainty reduction with the bottom-up process of composing evidence, specifically, evidence obtained from applying testing and verification techniques.

**Applicability to "new" kinds of systems.** We believe that our view – rigorous, uncertainty-reduction focused and evidence composing – is directly applicable to systems developed using machine learning, e.g., self-driving cars.

This paper is organized as follows: In Sect. 2, we briefly describe syntax of assurance cases. In Sect. 3, we outline possible sources of uncertainty encountered as part of system development. In Sect. 4, we describe the benefits of a rigorous language for assurance cases by way of example. In Sect. 5, we describe, again by way of example, a possible method of composing evidence. In Sect. 6, we develop a high-level assurance case for a pedestrian detection subsystem. We conclude in Sect. 7 with a discussion of possible challenges and opportunities.

## **2 Background on Assurance Case Modeling Notation**

The most commonly used representation for safety cases is the graphical Goal Structuring Notation (GSN) [30], which is intended to support the assurance of critical properties of systems (including safety). GSN is comprised of six core elements – see Fig. 1. Arguments in GSN are typically organized into a tree of the core elements shown in Fig. 1<sup>1</sup>. The root is the overall goal to be satisfied by the system, and it is gradually decomposed (possibly via strategies) into sub-goals and finally into solutions, which are the leaves of the safety case. Connections between goals, strategies and solutions represent *supported-by* relations, which indicate inferential or evidential relationships between elements. Goals and strategies may be optionally associated with some contexts, assumptions and/or justifications by means of *in-context-of* relations, which declare a contextual relationship between the connected elements.

<sup>1</sup> In this paper, we use both diamond and triangle shapes interchangeably to depict an "undeveloped" element.

**Fig. 1.** Core GSN elements from [30].

**Fig. 2.** Example safety case in GSN (from [30]).

For example, consider the safety case in Fig. 2. The overall goal **G1** is that the "Control System is acceptably safe to operate" given its role, context and definition, and it is decomposed into two sub-goals: **G2**, for eliminating and mitigating all identified hazards, and **G3**, for ensuring that the system software is developed to an appropriate ASIL. Assuming that all hazards have been identified, **G2** can in turn be decomposed into three sub-goals by considering each hazard separately (**S1**), and each separate hazard is shown to be satisfied using evidence from formal verification (**Sn1**) or fault tree analysis (**Sn2**). Similarly, under some specific context and justification, **G3** can be decomposed into two sub-goals, each of which is shown to be satisfied by the associated evidence.

## **3 Sources of Uncertainty in Software Development**

In this section, we briefly survey uncertainty in software development, broadly split into the categories of uncertainties about the specifications, about the environment, about the system itself, and about the argument of its safety. For each part, we aim to address how building an assurance case is related to understanding and mitigating such uncertainties.

**Uncertainty in Specifications.** Software specifications tend to suffer from incompleteness, inconsistency and ambiguity [42,46]. Specification uncertainty stems from a misunderstanding or an incomplete understanding of how the system is supposed to function in early phases of development; e.g., miscommunication and inability of stakeholders to transfer knowledge due to differing concepts and vocabularies [2,13]; unknown values for sets of known events (a.k.a. the *known unknowns*); and the unknown and unidentifiable events (a.k.a. the *unknown unknowns*) [57].

Recently, machine-learning approaches for interactively learning the software specifications have become popular; we discuss one such example, of pedestrian detection, in Sect. 6. Other mitigations of specification uncertainties, suggested by various standards and research, are identification of edge cases [36], hazard and obstacle analysis [55] to help identify unknown unknowns [35], step-wise refinement to handle partiality in specifications, ontology- [9] and information retrieval-driven requirements engineering approaches [21], as well as generally building arguments about addressing specification uncertainties.

**Environmental Uncertainty.** The system's environment can refer to adjacent agents interacting with the system, a human operator using the system, or physical conditions of the environment. Sources of environmental uncertainties have been thoroughly investigated [19,48]. One source originates from unpredictable and changing properties of the environment, e.g., assumptions about actions of other vehicles in the autonomous vehicle domain or assuming that a plane is on the runway if its wheels are turning. Another uncertainty source is input errors from broken sensors, missing, noisy and inaccurate input data, imprecise measurements, or disruptive control signals from adjacent systems. Yet another source might be when changes in the environment affect the specification. For example, consider a robotic arm that moves with the expected precision but the target has moved from its estimated position.

A number of techniques have been developed to mitigate environmental uncertainties, e.g., runtime monitoring systems such as RESIST [10], or machinelearning approaches such as FUSION [18] which self-tune the adaptive behavior of systems to unanticipated changes in the environment. More broadly, environmental uncertainties are mitigated by a careful requirements engineering process, by principled system design and, in assurance cases, by an argument that they had been adequately identified and adequately handled.

**System Uncertainties.** One important source of uncertainty is faced by developers who do not have sufficient information to make decisions about their system during development. For example, a developer may have insufficient information to choose a particular implementation platform. In [19,48], this source of uncertainty is referred to as *design-time uncertainty*, and some approaches to handling it are offered in [20]. Decisions made while resolving such uncertainties are crucial to put into an assurance argument, to capture the context, i.e., a particular platform is selected because of its performance, at the expense of memory requirements.

Another uncertainty refers to correctness of the implementation [7]. This uncertainty lays in the V&V procedure and is caused by whether the implementation of the tool can be trusted, whether the tool is used appropriately (that is, its assumptions are satisfied), and in general, whether a particular verification technique is the right one for verifying the fulfillment of the system requirements [15]. We address some of these uncertainties in Sect. 5.

**Argument Uncertainty.** The use of safety arguments to demonstrate safety of software-intensive systems raises questions such as the extent to which these arguments can be trusted. That is, how confident are we that a verified, validated software is actually safe? How much evidence and how thorough of an argument do we require for that?

To assess uncertainties which may affect the system's safety, researchers have proposed techniques to estimate confidence in structured assurance cases, either through qualitative or quantitative approaches [27,44]. The majority of these are based on the Dempster-Shafer Theory [31,60], Josang's Opinion Triangle [17], Bayesian Belief Networks (BNNs) [16,61], Evidential Reasoning (ER) [45] and weighted averages [59]. The approaches which use BBNs treat safety goals as nodes in the network and try to compute their conditional probability based on given probabilities for the leaf nodes of the network. Dempster-Shafer Theory is similar to BBNs but is based on the *belief function* and its *plausibility* which is used to combine separate pieces of information to calculate the probability. The ER approach [45] allows the assessors to provide individual judgments concerning the trustworthiness and appropriateness of the evidence, building a separate argument from the assurance case.

These approaches focus on assigning and propagating confidence measures but do not specifically address uncertainty in the argument. They also focus on aggregating evidence coming from multiple sources but treat it as a "black box", instead of how a piece of evidence from one source might compose with another. We look at these questions in Sects. 4 and 5, respectively.

## **4 Formality in Assurance Cases**

As discussed in Sect. 1, we believe that the ultimate goal of an assurance case is to explicitly capture and manage uncertainty, and ultimately reduce it to an acceptable level. Even informal arguments improve safety, e.g., by making people decompose the top level goal case-wise, and examine the decomposed parts critically. But the decomposed cases tend to have an ad hoc structure dictated by experience and preference, with under-explored completeness claims, giving both developers and regulators a false sense of confidence, no matter how confidence is measured, since they feel that their reasoning is rigorous even though it is not [58]. Moreover, as assurance cases are produced and judged by humans, they are typically based on *inductive arguments*. Such arguments are susceptible to fallacies (e.g., arguing through circular reasoning, using justification based

**Fig. 3.** A fragment of the Lane Management (LMS) Safety case.

on false dichotomies), and evaluations by different reviewers may lead to the discovery of different fallacies [28].

There have been several attempts to improve credibility of an argument by making the argument structure more formal. [25] introduces the notion of confidence maps as an explicit way of reasoning about sources of doubt in an argument, and proposes justifying confidence in assurance arguments through *eliminative induction* (i.e., an argument by eliminating sources of doubt). [29] highlights the need to model both evidential and argumentation uncertainties when evaluating assurance arguments, and considers applications of the formally evaluatable extension of Toulmin's argument style proposed by [56]. [11] details VAA – a method for assessing assurance arguments based on Dempster-Shafer theory. [51] is a proponent of completely deductive reasoning, narrowing the scope of the argument so that it can be formalized and potentially formally checked, using automated theorem provers, arguing that this would give a modular framework for assessing (and, we presume, reusing) assurance cases. [1] relaxes Rushby's position a bit, aiming instead at formal assurance argumentation "modulo engineering expertise", and proof obligations about consistency of arguments remain valid even for not fully formal assurance arguments. To this end, they provided a specific formalization of goal validity given validity of subgoals and contexts/context assumptions, resulting in such rules as

**Fig. 4.** An alternative representation of the same LMS fragment.

"assumptions on any given element must not be contradictory nor contradict the context assumed for that goal" [1].

**Our Position.** We believe that a degree of formality in assurance cases can go a long way not only towards establishing its validity, identifying and framing implicit uncertainties and avoiding fallacies, but also supporting assurance case modularity, refactoring and reuse. We illustrate this position on an example.

**Example.** Consider two partially developed assurance cases that argue that the lane management system (LMS) of a vehicle is safe (Figs. 3 and 4). The top-level safety goal **G1** in Fig. 3 is first decomposed by the strategy **Str1** into a set of subgoals which assert the safety of the LMS subsystems. An assessor can only trust that goals **G2** and **G3** imply **G1** by making an implicit assumption that the system safety is completely determined by the safety of its individual subsystems. Neither the need for this assumption nor the credibility of the assumption itself are made explicit in the assurance case, which weakens the argument and complicates the assessment process. The argument is further weakened by the absence of a completeness claim that all subsystems have been covered by this decomposition.

Strategies **Str2** and **Str3** in Fig. 3 decompose the safety claims about each subsystem into arguments over the relevant hazards. Yet the hazards themselves are never explicitly stated in the assurance case, making the direct relevance of each decomposed goal to its corresponding parent goal, and thus to the argument as a whole, unclear. While goals **G6** and **G9** attempt to provide completeness claims for their respective decompositions, they do so by citing lack of negative evidence without describing efforts to uncover such evidence. This justification is fallacious and can be categorized as "an argument from ignorance" [28].

Now consider the assurance case in Fig. 4 which presents a variant of the argument in Fig. 3, refined with context nodes, justification nodes and completeness claims. The top-level goal **G1** is decomposed into a set of subgoals asserting that particular hazards have been mitigated, as well as a completeness claim **G3C** stating that hazards **H1** and **H2** are the only ones that may be prevalent enough to defeat claim **G1**. Context nodes **C1** and **C2** define the hazards themselves, which clarifies the relevance of each hazard-mitigating goal. The node **J1** provides a justification for the validity of **Str1** by framing the decomposition as a proof by (exhaustive) cases. That is, **Str1** is justified by the statement that if **H1** and **H2** are the only hazards that could potentially make the system unsafe, then the system is safe if **H1** and **H2** have been adequately mitigated. This rigorous argument can be represented by the logical expression **G3C** =⇒ ((**G2** ∧ **G4**) =⇒ **G1**), and if completeness holds then **G2** and **G4** are sufficient to show **G1**. We now have a rigorous argument step that our confidence in **G1** is a direct consequence of confidence in its decomposed goals **G2**, **G3C** and **G4**, even though there may still be uncertainty in the evidential evaluation of **G2**, **G3C** and **G4**. That is, uncertainty has been made explicit and can be reasoned about at the evidential level. By removing argumentation uncertainty and explicating implicit assumptions, we get a more comprehensive framework for assurance case evaluation, where the relation between all reasoning steps is formally clear. Note that if the justification provides an inference rule, then the argument becomes deductive. Otherwise, it is weaker (the justification node can be used to quantify just *how* weaker) but still rigorous.

While the completeness claim **G3C** in Fig. 4 may be directly supported by evidence, the goals **G2** and **G4** are further decomposed by the strategies **Str2** and **Str3**, respectively, which represent decompositions over subsystems. These strategies are structured similarly to **Str1**, and can be expressed by the logical expressions **G7C** =⇒ ((**G5** ∧ **G6**) =⇒ **G2**) and **G10C** =⇒ ((**G8** ∧ **G9**) =⇒ **G4**), respectively. In Fig. 3, a decomposition by subsystems was applied directly to the top-level safety goal which necessitated a completeness claim that the safety of all individual subsystems implied safety of the entire system. Instead, the argument in Fig. 4 only needs to show that the set of subsystems in each decomposition is complete w.r.t. a particular hazard, which may be a more feasible claim to argue. This ability to transform an argument into a more easily justifiable form is another benefit of arguing via rigorous reasoning steps.

## **5 Combining Evidence**

Evidence for assurance cases can come from a variety of sources: results from different testing and verification techniques, human judgment, or their combination. Multiple testing and verification techniques may be used to make the evidence more complete. A verification technique *complements* another if it is able

**Fig. 5.** Confidence argument for code review workflow (from [6]).

to verify types of requirements which cannot be verified by the other technique. For example, results of verification of properties via a bounded model checker (BMC) are complemented by additional test cases [8]. A verification technique *supports* another if it is used to detect faults in the other's verification results, thus providing backing evidence [33]. For example, a model checking technique may support a static analysis technique by verifying the faults detected [6]. Note that these approaches are principally different from just aggregating evidence treating it as a blackbox!

Habli and Kelly [32] and Denney and Pai [15] present safety case patterns for the use of formal method results for certification. Bennion et al. [3] present a safety case for arguing the compliance of a particular model checker, namely, the Simulink Design Verifier for DO-178C. Gallina and Andrews [23] argue about adequacy of a model-based testing process, and Carlan et al. [7] provide a safety pattern for choosing and composing verification techniques based on how they contribute to the identification or mitigation of systematic faults known to affect system safety.

**Our Position.** We, as a community, need to figure out the precise conditions under which particular testing and verification techniques "work" (e.g., modeling floating-point numbers as reals, making a small model hypothesis to justify sufficiency of a particular loop unrolling, etc.), and how they are intended to be composed in order to reduce uncertainty about whether software satisfies its specification. We illustrate a particular composition here.

**Example.** In this example, taken from [6], a model checker supports static analysis tools (that produce false negatives) by verifying the detected faults [6]. The assurance case is based on a workflow (not shown here) where an initial review report is constructed, by running static analysis tools and possibly peer code reviews. Then the program is annotated with the negation of each potential erroneous behavior as a desirable property for the program, and given to a model-checker. If the model-checker is able to verify the property, it is removed from the initial review report and not considered as an error. If the modelchecker finds a violation, the alleged error is confirmed. In this case, a weakestprecondition generation mechanism is applied to find out the environmental conditions (external parameters that are not under the control of the program) under which the program shows the erroneous behavior. These conditions and the error trace are then added to the error description.

The paper [6] presents both the assurance case and the confidence argument for the code review workflow. We reproduce only the latter here (see Fig. 5), focusing on reducing uncertainty about the accuracy and consistency of the code property (goal **G2**). False positives generated by static analysis are mitigated using BMC – a method with a completely different verification rationale, thus implementing the safety engineering principle of independence (**J2**). Strategy (**Str2**) explains how errors can be confirmed or dismissed using BMC (goal **G6**). The additional information given by BMC can be used for the mitigation of the error (**C2**).

This approach takes good steps towards mitigating particular assurance deficits using a composition of verification techniques but leaves open several problems: how to ensure that BMC runs under the same environmental conditions as the static analysis tools? how deeply should the loops be unrolled? what to do with cases when the model-checker runs out of resources without giving a conclusive answer? and in general, what are the conditions under which it is safe to trust the "yes" answers of the model-checker.

## **6 Assurance Cases for ML Systems**

Academia and industry are actively building systems using AI and machine learning, including a rapid push for ML in safety-critical domains such as medical devices and self-driving cars. For their successful adoption in society, we need to ensure that they are trustworthy, including obtaining confidence in their behavior and robustness.

**Fig. 6.** A partially developed GSN safety case of pedestrian detector example.

Significant strides have already been made in this space, from extending mature testing and verification techniques to reasoning about neural networks [24,37,47,54] for properties such as safety, robustness and adequate handling of adversarial examples [26,34]. There is active work in designing systems that balance learning under uncertainty and acting safely, e.g., [52] as well as the broad notion of fairness and explainability in AI, e.g., [49].

**Our Position.** We believe that assurance cases remain a unifying view for MLbased systems just as much as for more conventional systems, allowing us to understand how the individual approaches fit into the overall goal of assuring safety and reliability and where there are gaps.

**Example.** We illustrate this idea with an example of a simple pedestrian detector (PD) component used as part of an autonomous driving system. The functions that PD supports consist of detection of objects in the environment ahead of the vehicle, classification of an object as a *pedestrian* or *other*, and localization of the position and extent of the pedestrian (indicated by bounding box). We assume that PD is implemented as a convolutional deep neural network with various stages to perform feature extraction, proposing regions containing objects and classification of the proposed objects. This is a typical approach for two-stage object detectors (e.g., see [50]).

**Fig. 7.** A framework for factors affecting perceptual uncertainty (source: [12]).

As part of a safety critical system, PD contributes to the satisfaction of a top-level safety goal requiring that the vehicle always maintain a safe distance from all pedestrians. Specific safety requirements for PD can be derived from this goal, such as (RQ1) PD misclassification rate (i.e., classifying a pedestrian as "other") must be less than ρ*mc*, (RQ2) PD false positive rate (i.e., classifying any non-pedestrian object or non-object as "pedestrian") must be less than ρ*fp*, and (RQ3) PD missed detection rate (i.e., missing the presence of pedestrian) must be less than ρ*md*. Here, the parameters ρ*mc*, ρ*fp* and ρ*md* must be derived in conjunction with the control system that uses the output from PD to plan the vehicle trajectory.

The partially developed safety case for PD is shown in Fig. 6. The three safety requirements are addressed via the strategy **Str1** and, as expected, testing results are given as evidence of their satisfaction. However, since testing can only provide limited assurance about the behaviour of PD in operation, we use an additional strategy, **Str2**, to argue that a rigorous method was followed to develop PD. Specifically, we follow the framework of [12] for identifying the factors that lead to uncertainty in ML-based perceptual software such as PD.

The framework is defined at a high level in Fig. 7. The left "perception triangle" shows how the perceptual concept (in the case of PD, the concept "pedestrian") can occur in various scenarios in the world, how it is detected using sensors such as cameras, and how this can be used to collect and label examples in order to train an ML component to learn the concept. The perception triangle on the right is similar but shows how the trained ML component can be used during the system operation to make inferences (e.g., perform the pedestrian detection). The framework identifies seven factors that could contribute to uncertainty in the behaviour of the perceptual component. A safety case demonstrating a rigorous development process should provide evidence that each factor has been addressed.

In Fig. 6, strategy **Str2** uses the framework to argue that the seven factors are adequately addressed for PD. We illustrate development of two of these factors here. Scenario coverage (Goal **G-F2**) deals with the fact that the training data must represent the concept in a sufficient variety of scenarios in which it could occur in order for the training to be effective. The argument here first decomposes this goal into different types of variation (**Str3**) and provides appropriate evidence for each. The adequacy of age and ethnicity variation in the data set is supported by census data (**S2**) about the range of these dimensions of variation in the population. The variation in the pedestrian pose (i.e., standing, leaning, crouching, etc.) is supplied by a standard ontology of human postures (**S3**). Finally, evidence that the types are adequate to provide sufficient coverage of variation (completeness) is provided by an expert review (**S4**).

Another contributing factor developed in Fig. 6 is model uncertainty (Goal **G-F6**). Since there is only finite training data, there can be many possible models that are equally consistent with the training data, and the training process could produce any one of them, i.e., there is residual uncertainty whether the produced model is in fact correct. The presence of model uncertainty means that while the trained model may perform well on inputs similar to the training data, there is no guarantee that it will produce the right output for other inputs. Some evidence of good behaviour here can be gathered if there are known properties that partially characterize the concept and can be checked. For example, a reasonable necessary condition for PD is that the object being classified as a pedestrian should be less than 9 ft tall. Another useful property type is an invariant, e.g., a rotated pedestrian image is still a pedestrian. Tools for property checking of neural networks (e.g., [37]) can provide this kind of evidence (**S5**). Another way to deal with model uncertainty is to estimate it directly. Bayesian deep learning approaches [22] can do this by measuring the degree of disagreement between multiple trained models that are equally consistent with the training data. The more the models are in agreement are about how to classify a new input, the less model uncertainty is present and the more confident one can be in the prediction. Using this approach on a test data set can provide evidence (**S6**) about the degree of model uncertainty in the model. This approach can also be used during the operation to generate a confidence score in each prediction and use a fault tolerance strategy that takes a conservative action when the confidence falls below a threshold.

## **7 Summary and Future Outlook**

In this paper, we tried to argue that an assurance case view on establishing system correctness provides a way to unify different components of the software development process and to explicitly manage uncertainty. Furthermore, although our examples came from the world of safety-critical automotive systems, the assurance case view is broadly applicable to a variety of systems, not just those in the safety-critical domain and includes those constructed by nontraditional means such as ML. This view is especially relevant to much of the research activity being conducted by the ETAPS community since it allows, in principle, to understand how each method contributes to the overall problem of system assurance.

Most traditional assurance methods aim to build an informal argument, ultimately judged by a human. However, while these are useful for showing compliance to standards and are relatively easy to construct and read, such arguments may not be rigorous, missing essential properties such as completeness, independence, relevance, or a clear statement of assumptions [51]. As a result, fallacies in existing assurance cases are present in abundance [28]. To address this weakness, we argued that building assurance cases should adhere to systematic principles that ensure rigor. Of course, not all arguments can be fully deductive since relevance and admissibility of evidence is often based on human judgment. Yet, an explicit modeling and management of uncertainty in evidence, specifications and, assumptions as well as the clear justification of each step can go a long way toward making such arguments valid, reusable, and generally useful in helping produce high quality software systems.

**Challenges and Opportunities.** Achieving this vision has a number of challenges and opportunities. In our work on impact assessment of model change on assurance cases [39,40], we note that even small changes to the system may have significant impact on the assurance case. Because creation of an assurance case is costly, this brittleness must be addressed. One opportunity here is to recognize that assurance cases can be refactored to improve their qualities without affecting their semantics. For example, in Sect. 4, we showed that the LMS safety claim could either be decomposed first by hazards and then by subsystems or vice versa. Thus, we may want to choose the order of decomposition based on other goals, e.g., to minimize the impact of change on the assurance case by pushing the affected subgoals lower in the tree. Another issue is that complex systems yield correspondingly complex assurance cases. Since these must ultimately be judged by humans, we must manage the cognitive load the assurance case puts on the assessor. This creates opportunities for mechanized support, both in terms of querying, navigating and analyzing assurance cases as well as in terms of modularization and reuse of assurance cases.

Evidence composition discussed in Sect. 5 also presents significant challenges. While standards such as DO-178C and ISO26262 give recommendations on the use of testing and verification, it is not clear how to compose partial evidence or how to use results of one analysis to support another. Focusing on how each technique reduces potential faults in the program, clearly documenting their context of applicability (e.g., the small model hypothesis justifying partial unrolling of loops, properties not affected by approximations of complex program operations and datatypes often done by model-checkers, connections between the modeled and the actual environment, etc.) and ultimately connecting them to reducing uncertainties about whether the system satisfies the essential property are keys to making tangible progress in this area.

Finally, in Sect. 6, we showed how the assurance case view could apply to new development approaches such as ML. Although such new approaches provide benefits over traditional software development, they also create challenges for assurance. One challenge is that analysis techniques used for verification may be immature. For example, while neural networks have been studied since the 1950's, pragmatic approaches to their verification have been investigated only recently [53]. Another issue is that prerequisites for assurance may not be met by the development approach. For example, although they are expressive, neural networks suffer from uninterpretability [41] – that is, it is not feasible for a human to examine a trained network and understand what it is doing. This is a serious obstacle to assurance because formal and automated methods account for only part of the verification process, augmented by reviews. As a result, increasing the interpretability of ML models is an active area of current research.

While all these challenges are significant, the benefit of addressing them is worth the effort. As our world moves towards increasing automation, we must develop approaches for assuring the dependability of the complex systems we build. Without this, we either stall progress or run the risk of endangering ourselves – neither alternative seems desirable.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Software Verification I

# **Tool Support for Correctness-by-Construction**

Tobias Runge1(B), Ina Schaefer<sup>1</sup>, Loek Cleophas2,3, Thomas Th¨um<sup>1</sup>, Derrick Kourie3,4, and Bruce W. Watson3,4

<sup>1</sup> Software Engineering, TU Braunschweig, Braunschweig, Germany *{*tobias.runge,i.schaefer,t.thuem*}*@tu-bs.de <sup>2</sup> Software Engineering Technology, TU Eindhoven, Eindhoven, The Netherlands <sup>3</sup> Information Science, Stellenbosch University, Stellenbosch, South Africa

*{*loek,derrick,bruce*}*@fastar.org <sup>4</sup> Centre for Artificial Intelligence Research, CSIR, Pretoria, South Africa

**Abstract.** Correctness-by-Construction (CbC) is an approach to incrementally create formally correct programs guided by pre- and postcondition specifications. A program is created using refinement rules that guarantee the resulting implementation is correct with respect to the specification. Although CbC is supposed to lead to code with a low defect rate, it is not prevalent, especially because appropriate tool support is missing. To promote CbC, we provide tool support for CbC-based program development. We present CorC, a graphical and textual IDE to create programs in a simple while-language following the CbC approach. Starting with a specification, our open source tool supports CbC developers in refining a program by a sequence of refinement steps and in verifying the correctness of these refinement steps using the theorem prover KeY. We evaluated the tool with a set of standard examples on CbC where we reveal errors in the provided specification. The evaluation shows that our tool reduces the verification time in comparison to post-hoc verification.

## **1 Introduction**

*Correctness-by-Construction* (CbC) [12,13,19,23] is a methodology to construct formally correct programs guided by a specification. CbC can improve program development because every part of the program is designed to meet the corresponding specification. With the CbC approach, source code is incrementally constructed with a low defect rate [19] mainly based on three reasons. First, introducing defects is hard because of the structured reasoning discipline that is enforced by the refinement rules. Second, if defects occur, they can be tracked through the refinement structure of specifications. Third, the trust in the program is increased because the program is developed following a formal process [14].

Despite these benefits, CbC is still not prevalent and not applied for largescale program development. We argue that one reason for this is missing tool c The Author(s) 2019

support for a CbC-style development process. Another issue is that the programmer mindset is often tailored to the prevalent post-hoc verification approach. CbC has been shown to be beneficial even in domains where post-hoc verification is required [29]. In post-hoc verification, a method is verified against pre- and postconditions. In the CbC approach, we refine the method stepwise, and we can check the method partially after each step since every statement is surrounded by a pair of pre- and postconditions. The verification of refinement steps and Hoare triples reduces the proof complexity since the proof task is split into smaller problems. The specifications and code developed using the CbC approach can be used to bootstrap the post-hoc verification process and allow for an easier post-hoc verification as the method constructed using CbC generally is of a structure that is more amenable to verification [29].

In this paper, we present CorC,<sup>1</sup> a tool designed to develop programs following the CbC approach. We deliberately built our tool on the well-known post-hoc verifier KeY [4] to profit from the KeY ecosystem and future extensions of the verifier. We also add CbC as another application area to KeY, which opens the possibility for KeY users to adopt the CbC approach. This could spread the constructive CbC approach to areas where post-hoc verification is prevalent.

Our tool CorC offers a hybrid textual-graphical editor to develop programs using CbC. The textual editor resembles a normal programming editor, but is enriched with support for pre- and postcondition specifications. The graphical editor visualizes the code, its specification, and the program refinements in a tree-like structure. The developers can switch back and forth between both views. In order to support the correct application of the refinement rules, the tool is integrated with KeY [4] such that proof obligations can be immediately discharged during program development. In a preliminary evaluation, we found benefits of CorC compared to paper-and-pencil-based application of CbC and compared to post-hoc verification.

## **2 Foundations of Correctness-by-Construction**

Classically, CbC [19] starts with the specification of a program as a Hoare triple comprising a precondition, an abstract statement, and a postcondition. Such a triple, say *T*, should be read as a total correctness assertion: if *T* is in a state where the precondition holds and its abstract statement is executed, then the execution will terminate and the postcondition will hold. *T* will be true for a certain set of concrete program instantiations of the abstract program and false for other instantiations. A refinement of *T* is a triple, say *T* , which is true for a subset of concrete programs that render *T* to be true.

In our work, pre-/post-condition specifications for programs are written in *first-order logic* (FOL). A formula in FOL consists of atomic formulas which are logically connected. An atomic formula is a predicate which evaluates to true or

<sup>1</sup> https://github.com/TUBS-ISF/CorC, CorC is an acronym for Correctness-by-Construction.


**Fig. 1.** Refinement rules in CbC [19]

false. Programs in this work are written in the CorC language, which is inspired by the *Guarded Command Language* (GCL) [11] and presented below.

For the concrete instantiation of conditions and assignments, our tool uses a host language. We decided for Java, but other languages are also possible.

To create programs using CbC, we use refinement rules. A Hoare triple is refined by applying rules, which introduce CorC language statements, so that a concrete program is created. The concrete program obtained by refinement is guaranteed to be correct by construction, provided that the correctnesspreserving refinement steps have been accurately applied. In Fig. 1, we present the statements and refinement rules used in CbC and our tool.

*Skip.* A skip or empty statement is a statement that does not alter the state of the program (i.e., it does nothing) [11,19]. This means a Hoare triple with a skip statement evaluates to true if the precondition implies the postcondition.

*Assignment.* An assignment statement assigns an expression of type T to a variable, also of type T. In the tool, we use a Java-like assignment (x = y). To refine a Hoare triple {P} S {Q} with an assignment statement, the assignment rule is used. This rule replaces the abstract statement S by an assignment {P} x = E {Q} iff P implies Q[x := E].

*Composition.* A composition statement is a statement which splits one abstract statement into two. A Hoare triple {P} S {Q} is split to {P} S<sup>1</sup> {M} and {M} S<sup>2</sup> {Q} in which S is refined to S1 and S2. M is an intermediate condition which evaluates to true after S1 and before S2 is executed [11].

*Selection.* Selection in our CorC language works as a switch statement. It refines a Hoare triple {P} S {Q} to {P} **if** G<sup>1</sup> → S<sup>1</sup> **elseif** *...* G<sup>n</sup> → S<sup>n</sup> **fi** {Q}. The guards G<sup>i</sup> are evaluated, and the sub-statement S<sup>i</sup> of the *first* satisfied guard is executed. We use a switch-like statement so that every sub-statement has an associated guard for further reasoning. The selection refinement rule can only be used if the precondition P implies the disjunction of all guards so that at least one sub-statement could be executed.

*Repetition.* The repetition statement {P} **do** [I*,* V] G → S **od** {Q} works like a while loop in other languages. If the loop guard G evaluates to true, the associated loop statement S is executed. The repetition statement is specified with an invariant I and a variant V. To refine a Hoare triple {P} S {Q} with a repetition statement, (1) the precondition P has to imply the invariant I of the repetition statement, (2) the conjunction of invariant and the negation of the loop guard G have to imply the postcondition Q, and (3) the loop body has to preserve the invariant by showing that {I ∧ G} S {I} holds. To verify termination, we have to show that the variant V monotonically decreases in each loop iteration and has 0 as a lower bound.

*Weaken precondition.* The precondition of a Hoare triple can be weakened if necessary. The weaken precondition rule replaces the precondition P with a new one P only if P implies P [12].

*Strengthen postcondition.* To strengthen a postcondition, the strengthen postcondition rule can be used. A postcondition Q is replaced by a new one Q only if Q implies Q [12].

*Subroutine.* A subroutine can be used to split a program into smaller parts. We use a simple subroutine call where we prohibit side effects and parameters. A triple {P} S {Q} can be refined to a subroutine {P } *Sub* {Q }, if the precondition P of the subroutine is equal to the precondition P of the refined statement and the postcondition Q of the subroutine is equal to the postcondition Q of the refined statement. The subroutine can be constructed as a separate CbC program to verify that it satisfies the specification. The Hoare triple {P } *Sub* {Q } is the starting point to construct a program using CbC.

## **3 Correctness-by-Construction by Example**

To introduce the programming style of CbC, we demonstrate the construction of a linear search algorithm using CbC [19]. The linear search problem is defined as follows: We have an integer array a of some length, and an integer variable x. We try to find an element in the array a which has the same value as the variable x, and we return the index i where the (last) element x was found, or −1 if the element is not in the array.

To construct the algorithm, we start with concretizing the pre- and postcondition of the algorithm. Before the algorithm is executed, we know that we have an integer array. Therefore, we specify a=null ∧ a*.*length≥0 as precondition P. The postcondition forces that if the index i is greater than or equal to zero, the element is found on the returned index i (Q := (i≥0 =⇒ a[i]=x)).

**Fig. 2.** Refinement steps for the linear search algorithm

Our algorithm traverses the array in reverse order and checks for each index whether the value is equal to x. In this case, the index is returned. To create this algorithm, we construct an invariant I for the loop:

## I := ¬appears(a*,* x*,* i + 1*,* a*.*length) ∧ i≥−1 ∧ i*<*a*.*length

The invariant is used to split the array into two parts. A part from i + 1 to a.length where x is not contained, and a part from zero to i which is not checked yet. In every iteration, the next index of the array is checked. The predicate appears(a*,* x*,* l*,* h) asserts that x occurs in array a inside the range from l (included) to h (excluded). The predicate can be translated to FOL as ∃i : (i≥l ∧ i*<*h ∧ a[i]=x).

We can use the CbC refinement rules to implement linear search. The refinement steps for the example are shown in Fig. 2 and numbered from 1 to 4 . To create a loop in the program, we need to initialize a loop counter variable to establish the invariant. Therefore, we split the program by introducing a composition statement (1 in Fig. 2). The invariant I is used as intermediate condition (i.e., M := I), because it has to be true after the initialization, and before the first loop step. The statement st1 is refined to an assignment statement 2 . We initialize i with a*.*length − 1 to start at the end of the array. This assignment satisfies the intermediate condition I where i is replaced by a*.*length − 1. The range of appears is empty, and therefore the predicate evaluates to true. To refine the second statement (st2), we use the repetition refinement rule 3 . As long as x is not found, we iterate through the array. As guard of the repetition, we use (i≥0 ∧ a[i]=x). The invariant of the repetition is the invariant I introduced above. The variant V is i + 1. To verify that this refinement is valid, we have to verify that the precondition of the repetition statement implies the invariant, and that the invariant and the negated guard imply the postcondition of the repetition (cf. Rule 5). Both are valid because the precondition is equal to the invariant and the postcondition of the repetition statement (in this case it is Q) is equal to the negated guard. The last step is to refine the abstract loop statement (loopSt) 4 . We use an assignment to decrease i and get the final program. We can verify that the invariant holds after each loop iteration. The program terminates because the variant decreases in every step and it is always greater than or equal to zero.

## **4 Tool Support in CorC**

CorC extends KeY's application area by enabling CbC to spread the constructive engineering to areas where post-hoc verification is prevalent. KeY programmers can use both approaches to construct formally correct programs. By using CorC, they develop specification and code that can bootstrap the post-hoc verification. The CorC tool<sup>2</sup> is realized as an Eclipse plug-in in Java. We use the Eclipse Modeling Framework (EMF)<sup>3</sup> to specify a CbC meta model. This meta model is used by two editor views, a textual and a graphical editor. The Hoare triple verification is implemented by the deductive program verification tool KeY [4]. In the following list, we summarize the features of CorC.


## **4.1 Graphical Editor**

The graphical editor represents CbC-based program refinement by a tree structure. A node represents the Hoare triple of a specific CorC language statement. Figure 3 presents the linear search algorithm of Sect. 3 in the graphical editor. The structure of the tree is the same as in Fig. 2. The additional nodes on the right specify used program variables including their type and global invariant

<sup>2</sup> https://github.com/TUBS-ISF/CorC.

<sup>3</sup> https://eclipse.org/emf/.

**Fig. 3.** Linear search example in the graphical editor

conditions. The global invariant conditions are added to every pre- and postcondition of Hoare triples to simplify the construction of the program. In the example, we specify the array a and the range of variable i to support the verification, as KeY requires this range to be explicit for verification.

The root node of the tree shows the abstract Hoare triple for the overall program with a symbolic name for the abstract statement. In every node, the pre- and postcondition are specified on the left and right of the node under the corresponding header. A composition statement node, the second statement of the tree, contains the pre- and postcondition and additionally defines an intermediate condition. The intermediate condition is the middle term in the bottom line. Both abstract sub-statements of the composition have a symbolic name and can be further refined by adding a connection to another node (i.e., creating a parent-child relation). The repetition node contains fields to specify the invariant, the guard and the variant of the repetition. These fields are in the middle row. The pre- and postcondition are associated to the inner loop statement. An assignment node (cf. both leaf nodes of the figure) contains the precondition, the assignment, and the postcondition. The representations of the nodes for the refinements not illustrated in this example are similar.

Refinement steps are represented by edges. The pre- and postconditions are propagated from parents to their children on drawing the parent/child relation. We explicitly show the propagated conditions in a node to improve readability. The propagated conditions from the parent are unmodifiable because refinement rules determine explicitly how conditions are propagated. An exception are the rules to weaken the precondition or strengthen the postcondition. Here, the conditions can be overridden. At the repetition statement, we only depict the pre-/postconditions of the inner loop statement to reduce the size of this node. The pre-/postconditions of the parent node (in our example the composition statement) are not shown explicitly, but they are propagated internally to verify that the repetition refinement rule is satisfied. To visualize the verification status, the nodes have a green border if proven, a red one otherwise.

By showing the Hoare triples explicitly, problems in the program can be localized. If some leaf node cannot be proven, the user has to check the assignment and the corresponding pre-/postcondition. If an error occurred, the conditions on the refinement path up to pre-/postcondition of the starting Hoare triple can be altered. Other paths do not need to be checked. To prove the program correct, we have to prove that the refinement is correct. Aside from the side conditions of refinement rules (cf. iff conditions in refinement rules), only the leaf nodes of the refinement tree which contain basic Hoare triples with skip or assignment statements need to be verified by a prover, while all composite statements are correct by construction of their conditions.

To support the user in developing intermediate conditions for composition statements, our tool can compute the weakest precondition from a postcondition and a concrete assignment by using the KeY theorem prover. So, the user can create a specific assignment statement and generate the intermediate conditions afterwards. We also support modularization, to cover cases where algorithms become too large. Sub-algorithms can be created using CbC in other CorC programs. We introduce a simple subroutine rule which can be used as a leaf node in the editor. The subroutine has a name and it is connected to a second diagram with the same name as the subroutine. This subroutine call is similar to a classic method call. It can be used to decompose larger CbC developments to multiple smaller programs.

#### **4.2 Textual Editor**

The textual editor is an editor for the CorC programming language described above. The user writes code by using keywords for the specific statements and enriches the code with conditions, such as invariants or intermediate conditions, and assignments in our CorC syntax. The syntax of the composed statements in the textual editor is shown in Fig. 4. In the GlobalConditions declaration, we enumerate the needed global conditions separated with a comma. The used variables are enumerated after the JavaVariables keyword.

The linear search example program presented in Sect. 3 is shown in the syntax of CorC in Listing 1. The program starts with keyword Formula. The pre- and postcondition of the abstract Hoare triple are written after the pre: and post:

**Fig. 4.** Syntax of statements in textual editor

```
1 Formula "linearSearch"
2 pre: {"true"}
3 {
4 {
5 i=a.length -1;
6 }
7 intm: ["! appears(a, x, i+1, a.length)"]
8 {
9 while ("i>=0 & a[i]!=x")
10 inv: ["! appears(a, x, i+1, a.length)"]
11 var: ["i+1"] do
12 {
13 i=i-1;
14 } od
15 }
16 }
17 post: {"i>=0 -> a[i]=x"}
18
19 GlobalConditions
20 conditions {"a!=null", "a.length >=0",
21 "i>=-1", "i<a.length"}
22
23 JavaVariables
24 variables {"int[] a", "int x", "int i"}
```
**Listing 1.** Linear search example in the textual editor

keywords. The abstract statement of the Hoare triple is refined to a composition statement in lines 3–16. The statements are surrounded by curly brackets to establish the refinement structure. We have the first statement in lines 4–6, the intermediate condition in line 7 and the second statement in lines 8–15. The first statement is refined to an assignment (Line 5). The refinement is done by introducing an assignment in Java syntax (i = a*.*length − 1;). The second statement is refined to a repetition statement (cf. the syntax of a repetition statement in Fig. 4). We specify the guard, the invariant, and the variant. Finally, the single statement of the loop body is refined to an assignment in Line 13.

As in the graphical editor, pre-/postconditions are propagated top-down from a parent to a child statement. For example, the intermediate condition of a

```
1 \javaSource "src";
2 \include "helper.key";
3 \programVariables {int x;}
4 \problem {
5 (x = 0) -> \<{x=x+1;}\> (x = 1)
6 }
```
**Listing 2.** KeY problem file

composition statement which is the postcondition of the first sub-statement and the precondition of the second, appears only once in the editor (e.g., Line 7). To support the user, we implemented syntax highlighting and a content assist. When starting to write a statement, a user may employ auto-completion where the statements are inserted following the syntax in Fig. 4. The user can specify the conditions, then the next statement can be refined. The editor also automatically checks the syntax and highlights syntax errors. Information markers are used to indicate statements which are not proven yet. For example, the Hoare triple of the assignment statement (i = a*.*length − 1) in Listing 1 has to be verified, and CorC marks the statement according to the proof completion results.

#### **4.3 Verification of CorC Programs**

To prove the refined program is correct, we have to prove side conditions of refinements correct (e.g., prove that an assignment satiesfies the pre-/postcondition specification). This reduces the proof complexity because the challenge to prove a complete program is decomposed into smaller verification tasks. The intermediate Hoare triples are verified indirectly through the soundness of the refinement rules and the propagation of the specifications from parent nodes to child nodes [19]. Side conditions occur in all refinements (cf. iff conditions in refinement rules). These side conditions, such as the termination of repetition statements or that at least one guard in a selection has to evaluate to true, are proven in separate KeY files.

For the proof of concrete Hoare triples, we use the deductive program verifier KeY [4]. Hoare triples are transformed to KeY's dynamic logic syntax. The syntax of KeY problem files is shown in Listing 2. Using the keyword javaSource, we specify the path to Java helper methods which are called in the specifications. These methods have to be verified independently with KeY. A KeY helper file, where the users can define their own FOL predicates for the specification, is included with the keyword include. For example, in CorC a predicate *appears*(*a, x, l, h*) (cf. the linear search example) can be used which is specified in the helper file as a FOL formula. The variables used in the program are listed after the keyword programVariables. After problem, we define the Hoare triple to be proven, which is translated to dynamic logic as used by KeY. KeY problem files are verified by KeY. As we are only verifying simple Hoare triples with skip or assignment statements, KeY is usually able to close the proofs automatically if the Hoare triple is valid.

To verify total correctness of the program, we have to prove that all repetition statements terminate. The termination of repetition statements is shown by proving that the variants in the program monotonically decrease and are bounded. Without loss of generality, we assume this bound to equal 0, as this is what KeY requires. This is done by specifying the problem in the KeY file in the following way: (invariant & guard) -> {var0:=var} \<{std}\> (invariant & var<var0 & var>=0). The code of the loop body is specified at std to verify that after one iteration of the loop body the variant var is smaller than before but greater than or equal to zero.

To verify Hoare triples in the graphical editor, we implemented a menu entry. The user can right-click on a statement and start the automatic proof. If the proof is not closed, the user can interact with the opened KeY interface. To prove Hoare triples in the textual editor, we automatically generate all needed problem files for KeY whenever the user saves the editor file. The proof of the files is started using a menu button. The user gets feedback which triples are not proven by means of markers in the editor.

#### **4.4 Implementation as Eclipse Plugin**

We extended the Eclipse modeling framework with plugins to implement the two editors. We have created a meta model of the CbC language to represent the required constructs (i.e., statements with specification). The statements can be nested to create the CbC refinement hierarchy. The graphical and the textual editor are projections on the same meta model. The graphical editor is implemented using the framework Graphiti.<sup>4</sup> It provides functionality to create nodes and to associate them to domain elements, such as statements and specifications. The nodes can be added from a palette at the side of the editor, so no incorrect statement with its associated specification can be created. We implemented editing functionality to change the text in the node; the background model is changed simultaneously. Graphiti also provides the possibility to update nodes (e.g., to propagate pre- and postconditions), if we connect those nodes by refinement edges. The refinement is checked for compliance with the CbC rules.

The textual editor is implemented using XText.<sup>5</sup> We created a grammar covering every statement and the associated specification. If the user writes a program, the text is parsed and translated to an instance of the meta model. If a program is created in one editor, a model (an instance of our meta model) of the program is created in the background. We can easily transform one view into the other. The transformation is a generation step and not a live synchronization between both views, but it is carried out invisibly for the user when changing the views.

<sup>4</sup> https://eclipse.org/graphiti/.

<sup>5</sup> https://eclipse.org/Xtext/.


**Table 1.** Evaluation of the example programs

(GE) Grahical Editor, (TE) Textual Editor, (PhV) Post-hoc Verification

In implementing CorC, we considered the exchangeability of the host language. The specifications and assignments are saved as strings in the meta model. They are checked by a parser to comply with Java. This parser could be exchanged to support a different language. The verification is done by generating KeY files which are then evaluated by KeY. Here, we have to exchange the generation of the files if another theorem prover should be integrated. The information of the meta model may have to be adopted to fit the needs of the other prover. We also have to implement a programmatic call to the other prover.

## **5 Evaluation**

The tool support offers new chances to evaluate CbC versus post-hoc verification. We quantitatively compare the development and verification of programs with CorC and with post-hoc verification. This is to check the hypothesis that the verification of algorithms is faster with CorC than with post-hoc verification. We created the first eight algorithms from the book by Kourie and Watson [19] in our graphical editor. For comparison purposes, we also wrote each example as a plain Java program with JML specifications in order to directly verify it with KeY. The specifications are the same as in CorC. We measured the verification time and the proof nodes that KeY needed to close the proofs for both approaches. The results of the evaluation are presented in Table 1 (verification time rounded).

**Fig. 5.** Proof time of CbC and post-hoc verification in logarithmic scale

The algorithms have 5 to 14 nodes in the graphical editor and 12 to 26 lines of code in the textual editor. The Java version with a JML specification always has fewer lines (between 8% and 29% smaller). The additional specifications, such as the intermediate conditions of composition statements, and the global invariant conditions and variables cause more lines of code in the CbC program.

The verification of the eight algorithms worked nearly without problems. We verified 7 out of 8 examples within CorC. In the cases without problems, every Hoare triple and the termination of the loops could be proven. We had to prove fewer Hoare triples than nodes in the editor, as not every node has to be proven separately. Composition nodes are proven indirectly through the refinement structure. For *exponentiation*, *logarithm*, and *factorial*, we had to implement recursive helper methods which are used in the specification. Therefore, the programs impose upper bounds for integers to shorten the proof. The *binary search* algorithm could not be verified automatically in KeY using post-hoc verification or CorC. In each step, when the element is not found, the algorithm halves the array. KeY could not prove that the searched element is in the new boundaries because verification problems with arithmetic division are hard to prove for KeY automatically.

In the case of measured proof nodes, *maximum element* needs slightly fewer nodes proved with post-hoc verification than with CbC. In the other cases, the proofs for the algorithms constructed with CbC are 3% to 854% smaller. The largest difference was measured for the *pattern matching* algorithm. The proof is reduced to a ninth of the nodes.

The verification time is visualized in Fig. 5. The time is measured in milliseconds and scaled logarithmically. The proofs for the CbC approach are always faster showing lower proof complexity. For *maximum element*, *exponentiation*, *logarithm* and *factorial*, the post-hoc verification time requires between 22% and 60% more time. The difference increases for *Dutch flag* and *linear search* to 137% and 176%, respectively. Algorithm *pattern matching* has the biggest difference. Here, the CbC approach needs nearly a minute, but the post-hoc approach needs over 24 min. To verify our hypothesis, we apply the non-parametric paired Wilcoxon-Test [30] with a significance level of 5%. We can reject the null hypothesis that CbC verification and post-hoc verification have no significant difference in verification time (p-value = 0.007813). This rejection of the null hypothesis in an empirical evidence for our hypothesis that verification is faster with CorC than with post-hoc verification.

With our tool support, we were able to compare the CbC approach with posthoc verification. For our examples, we evaluated that the verification effort is reduced significantly which indicates a reduced proof complexity. It is worthwhile to further investigate the CbC approach, also to profit from synergistic effects in combination with post-hoc verification. As we built CorC on top of KeY, the post-hoc verification of programs constructed with CorC is feasible.

An advantage of CorC is the overview on all Hoare triples during development. In this way, we found some specifications where descriptions in the book by Kourie and Watson [19] were not precise enough to verify the problem in KeY. For example, in the *pattern matching* algorithm, we had to verify two nested loops. At one point, we had to verify that the invariant of the inner loop implies the invariant of the outer loop. This was not possible, so we extended the invariant of the inner loop to be the conjunction of both invariants. In the book of Kourie and Watson [19], this conjunction of both invariants was not explicitly used.

## **6 Related Work**

We compare CorC to other programming languages and tools using specification or refinements. The programming language Eiffel is an object-oriented programming language with a focus on design-by-contract [21,22]. Classes and methods are annotated with pre-/postconditions and invariants. Programs written in Eiffel can be verified using AutoProof [18,28]. The verification tool translates the program with assertions to a logic formula. An SMT-solver proves the correctness and returns the result. Spec# is a similar tool for specifying C# programs with pre-/postcondition contracts. These programs can be verified using Boogie. The code and specification is translated to an intermediate language (BoogiePL) and verified [5,6]. VCC [8] is a tool to annotate and verify C code. For this purpose, it reuses the Spec# tool chain. VeriFast [16] is another tool to verify C and Java programs with the help of contracts. The contracts are written in separation logic (a variant of Hoare logic). As in Eiffel, the focus of Spec#, VCC, and VeriFast is on post-hoc verification and debugging failed proof attempts.

The Event-B framework [2] is a related CbC approach. Automata-based systems including a specification are refined to a concrete implementation. Atelier B [1] implements the B method by providing an automatic and interactive prover. Rodin [3] is another tool implementing the Event-B method. The main difference to CorC is that CorC works on code and specifications rather than on automata-based systems.

ArcAngel [25] is a tool supporting Morgan's refinement calculus. Rules are applied to an initial specification to produce a correct implementation. The tool implements a tactic language for refinements to apply a sequence of rules. In comparison to our tool, ArcAngel does not offer a graphical editor to visualize the refinement steps. Another difference is that ArcAngel creates a list of proof obligations which have to be proven separately. CRefine [26] is a related tool for the Circus refinement calculus, a calculus for state-rich reactive systems. Like our tool, CRefine provides a GUI for the refinement process. The difference is that we specify and implement source code, but they use a state-based language. ArcAngelC [10] is an extension to CRefine which adds refinement tactics.

The tools iContract [20] and OpenJML [9] apply design-by-contract. They use a special comment tag to insert conditions into Java code. These conditions are translated to assertions and checked at runtime which is a difference to our tool because no formal verification is done. DBC-Python is a similar approach for the Python language which also checks assertions at runtime [27].

To verify the CbC program, we need a theorem prover for Hoare triples, such as KeY [4]. There are other theorem provers which could be used (e.g., Coq [7] or Isabelle/HOL [24]). The Tecton Proof System [17] is a related tool to structure and interactively prove Hoare logic specification. The proofs are represented graphically as a set of linked trees. These interactive provers do not fit our needs because we want to automate the verification process. KeY provides a symbolic execution debugger (SED) that represents all execution paths with specifications of the code to the verification [15]. This visualization is similar to our tree representation of the graphical editor. The SED can be used to debug a program if an error occur during the post-hoc verification process.

## **7 Conclusion and Future Work**

We implemented CorC to support the Correctness-by-Construction process of program development. We created a textual and a graphical editor that can be used interchangeably to enable different styles of CbC-based program development. The program and its specification are written in one of the editors and can be verified using KeY. This reduces the proof complexity with respect to post-hoc verification. We extended the KeY ecosystem with CorC. CorC opens the possibility to utilize CbC in areas where post-hoc verification is used as programmers could benefit from synergistic effects of both approaches. With tool support, CbC can be studied in experiments to determine the value of using CbC in industry.

For future work, we want to extend the tool support, and we want to evaluate empirically the benefits and drawbacks of CorC. To extend the expressiveness, we implement a rule for methods to use method calls in CorC. These methods have to be verified independently by CorC/KeY. We could investigate whether the method call rules of KeY can be used for our CbC approach. Another future work is the inference of conditions to reduce the manual effort. Postconditions can be generated automatically for known statements by using the strongest postcondition calculus. Invariants could be generated by incorporating external tools. As mentioned earlier, other host languages and other theorem provers can be integrated in our IDE.

The second work package for future work comprise the evaluation with a user study. We could compare the effort of creating and verifying algorithms with post-hoc verification and with our tool support. The feedback can be used to improve the usability of the tool.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Automatic Modeling of Opaque Code for JavaScript Static Analysis**

Joonyoung Park1,2(B) , Alexander Jordan1(B) , and Sukyoung Ryu2(B)

<sup>1</sup> Oracle Labs Australia, Brisbane, Australia *{*joonyoung.p.park,alexander.jordan*}*@oracle.com <sup>2</sup> KAIST, Daejeon, Republic of Korea *{*sryu.cs,gmb55*}*@kaist.ac.kr

**Abstract.** Static program analysis often encounters problems in analyzing library code. Most real-world programs use library functions intensively, and library functions are usually written in different languages. For example, static analysis of JavaScript programs requires analysis of the standard built-in library implemented in host environments. A common approach to analyze such *opaque code* is for analysis developers to build models that provide the semantics of the code. Models can be built either manually, which is time consuming and error prone, or automatically, which may limit application to different languages or analyzers. In this paper, we present a novel mechanism to support automatic modeling of opaque code, which is applicable to various languages and analyzers. For a given static analysis, our approach automatically computes analysis results of opaque code via dynamic testing during static analysis. By using testing techniques, the mechanism does not guarantee *sound* over-approximation of program behaviors in general. However, it is fully automatic, is scalable in terms of the size of opaque code, and provides more precise results than conventional over-approximation approaches. Our evaluation shows that although not all functionalities in opaque code can (or should) be modeled automatically using our technique, a large number of JavaScript built-in functions are approximated soundly yet more precisely than existing manual models.

**Keywords:** Automatic modeling · Static analysis · Opaque code · JavaScript

## **1 Introduction**

Static analysis is widely used to optimize programs and to find bugs in them, but it often faces difficulties in analyzing library code. Since most real-world programs use various libraries usually written in different programming languages, analysis developers should provide analysis results for libraries as well. For example, static analysis of JavaScript apps involves analysis of the builtin functions implemented in host environments like the V8 runtime system written in C++.

A conventional approach to analyze such *opaque code* is for analysis developers to create models that provide the analysis results of the opaque code. Models approximate the behaviors of opaque code, they are often tightly integrated with specific static analyzers to support precise abstract semantics that are compatible with the analyzers' internals.

Developers can create models either manually or automatically. Manual modeling is complex, time consuming, and error prone because developers need to consider all the possible behaviors of the code they model. In the case of JavaScript, the number of APIs to be modeled is large and ever-growing as the language evolves. Thus, various approaches have been proposed to model opaque code automatically. They create models either from specifications of the code's behaviors [2,26] or using dynamic information during execution of the code [8,9,22]. The former approach heavily depends on the quality and format of available specifications, and the latter approach is limited to the capability of instrumentation or specific analyzers.

In this paper, we propose a novel mechanism to model the behaviors of opaque code to be used by static analysis. While existing approaches aim to create general models for the opaque code's behaviors, which can produce analysis results for all possible inputs, our approach computes specific results of opaque code during static analysis. This on-demand modeling is specific to the abstract states of a program being analyzed, and it consists of three steps: sampling, run, and abstraction. When static analysis encounters opaque code with some abstract state, our approach generates samples that are a subset of all possible inputs of the opaque code by concretizing the abstract state. After evaluating the code using the concretized values, it abstracts the results and uses it during analysis. Since the sampling generally covers only a small subset of infinitely many possible inputs to opaque code, our approach does not guarantee the soundness of the modeling results just like other automatic modeling techniques.

The sampling strategy should select well-distributed samples to explore the opaque code's behaviors as much as possible and to avoid redundant ones. Generating too few samples may miss too much behaviors, while redundant samples can cause the performance overhead. As a simple yet effective way to control the number of samples, we propose to use *combinatorial testing* [11].

We implemented the proposed automatic modeling as an extension of SAFE, a JavaScript static analyzer [13,17]. For opaque code encountered during analysis, the extension generates concrete inputs from abstract states, and executes the code dynamically using the concrete inputs via a JavaScript engine (Node.js in our implementation). Then, it abstracts the execution results using the operations provided by SAFE such as lattice-*join* and our over-approximation, and resumes the analysis.

Our paper makes the following contributions:

– We present a novel way to handle opaque code during static analysis by computing a precise on-demand model of the code using (1) input samples that represent analysis states, (2) dynamic execution, and (3) abstraction.


In the remainder of this paper, we present our Sample-Run-Abstract approach to model opaque code for static analysis (Sect. 2) and describe the sampling strategy (Sect. 3) we use. We then discuss our implementation and experiences of applying it to JavaScript analysis (Sect. 4), evaluate the implementation using ECMAScript 5.1 builtin functions as benchmarks (Sect. 5), discuss related work (Sect. 6), and conclude (Sect. 7).

## **2 Modeling via Sample-Run-Abstract**

Our approach models opaque code by designing a universal model, which is able to handle arbitrary opaque code. Rather than generating a specific model for each opaque code statically, it produces a single general model, which produces results for given states using concrete semantics via dynamic execution. We call this universal model the *SRA model*.

In order to create the SRA model for a given static analyzer A and a dynamic executor E, we assume the following:


Then, the SRA model consists of the following three steps:

– *Sample* : <sup>S</sup>-→ ℘(S)

For a given abstract state <sup>s</sup>- ∈ S-, *Sample* chooses a finite set of elements from γ(s-), a possible set of values for <sup>s</sup>-. Because it is, in the general case, impossible to execute opaque code dynamically with all possible inputs, *Sample* should select representative elements efficiently as we discuss in the next section.

– *Run* : C × S → S

For a given program point and a concrete state at this point, *Run* generates executable code corresponding to the point and state, executes the code, and returns the result state of the execution. – *Abstract* : <sup>℘</sup>(S) <sup>→</sup> <sup>S</sup>-

$$\begin{aligned} \text{- }Abstract: \wp(S) \to S\\ \text{For a given set of concrete states, } Abstract \text{ produces an abstract state that} \\ \text{encompases the concrete states. One can apply } \alpha \text{ to each concrete state, join} \end{aligned}$$

**Fig. 1.** An abstract domain for even and odd integers

all the resulting abstract states, and optionally apply an over-approximation heuristic, comparable to widening *Broaden* : <sup>S</sup>- → S to mitigate missing behaviors of the opaque code due to the under-approximate sampling. We write the SRA model as ⇓*SRA*: <sup>C</sup> <sup>×</sup> <sup>S</sup>-→ S-

 and define it as follows: 

$$\begin{array}{lcl} & & \text{if } & \text{H} \\ \text{the the SRA model as } \Downarrow\_{SRA} \text{: } C \times \hat{S} \to \hat{S} \text{ and define it as follows} \\ & & \Downarrow\_{SRA} \text{ (\$c\$, \hat{s}\$)} = & Abstract(\{Run(c, s) \quad | \quad s \in Sample(\hat{s})\}) \\ & & = Broaden(\bigsqcup \{\alpha(\{Run(c, s)\}) \quad | \quad s \in Sample(\hat{s})\}) \end{array}$$

We now describe how ⇓*SRA* works using an example abstract domain for even and odd integers as shown in Fig. 1. Let us consider the code snippet x := abs(x) at a program point c where the library function abs is opaque. We use maps from variables to their concrete values for concrete states, maps from variables to their abstract values for abstract states, and the identity function for *Broaden* in this example. *Case* s-

<sup>1</sup> ≡ [x : n] *where n is a constant integer:*

$$\begin{array}{l} \text{tion for } Broaden \text{ in this example.}\\Case \widehat{s\_1} \equiv [\mathbf{x} : n] \text{ where } n \text{ is a constant integer:}\\ \\ \Downarrow\_{SRA} \left(c, \widehat{s\_1}\right) & \bigsqcup \{\alpha(\{Run(c, s)\}) \mid s \in \{ [\mathbf{x} : n] \}\} \\ &= \bigsqcup \{\alpha(\{Run(c, s)\}) \mid s \in \{ [\mathbf{x} : n] \}\} \\ &= \bigsqcup \{\alpha(\{Run(c, [\mathbf{x} : n])\})\} \\ &= [\mathbf{x} : |n|] \end{array}$$
  $\text{Because the given abstract state } \widehat{s\_1} \text{ contains a single abstract value}$ 

<sup>1</sup> contains a single abstract value corresponding to a single concrete value, *Sample* produces the set of all possible states, which makes ⇓*SRA* provide a sound and also the most precise result. *Case* s-

<sup>2</sup> ≡ [x : Even]*:*

$$\begin{array}{lcl} \Downarrow\_{SRA} \text{ provide a sound and also the most precise result.}\\ t\_2 \equiv [\textbf{x} : \textbf{Even}] :\\ \Downarrow\_{SRA} \left(c, \hat{s}\_2\right) = \bigsqcup \{ \alpha(\{Run(c,s)\}) \mid s \in Sample(\hat{s}\_2) \} \\ = \bigsqcup \{ \alpha(\{Run(c,s)\}) \mid s \in \{ [\textbf{x} : -2], [\textbf{x} : 0], [\textbf{x} : 2] \} \} \\ = \bigsqcup \{ \alpha(\{ [\textbf{x} : 0], [\textbf{x} : 2] \}) \} \\ = [\textbf{x} : \textbf{Even}] \end{array}$$

When *Sample* selects three elements from the set of all possible states represented by <sup>s</sup>-<sup>2</sup>, executing abs results in {[x : 0], [x : 2]}. Since joining these two abstract states produces Even, ⇓*SRA* models the correct behavior of abs by taking advantage of the abstract domain.

*Case* s-<sup>3</sup> ≡ [x : Int] *:*

$$\begin{array}{l} \{\boldsymbol{s}\in\operatorname{S}\_{3}\equiv[\mathtt{x}:\mathtt{Int}]\ :\ \boldsymbol{s} \\ \downarrow\downarrow\_{SRA}\ (\boldsymbol{c},\widehat{\boldsymbol{s}}\_{3})\\ \downarrow\!=\mathop{\mathsf{L}\_{3}}\{\boldsymbol{\alpha}(\{\operatorname{Run}(\boldsymbol{c},\boldsymbol{s})\})\ \mid\ \boldsymbol{s}\in\operatorname{Sample}(\widehat{\boldsymbol{s}}\_{3})\} \\ =\mathop{\mathsf{L}\_{3}}\{\boldsymbol{\alpha}(\{\operatorname{Run}(\boldsymbol{c},\boldsymbol{s})\})\ \mid\ \boldsymbol{s}\in\operatorname{Sample}(\widehat{\boldsymbol{s}}\_{2})\cup\!\operatorname{Sample}([\mathtt{x}:\mathtt{ODd}])\} \\ =\mathop{\mathsf{L}\_{3}}\{\boldsymbol{\alpha}(\{\operatorname{Run}(\boldsymbol{c},\boldsymbol{s})\})\ \mid\ \boldsymbol{s}\in\{[\mathtt{x}:-2],[\mathtt{x}:-1],[\mathtt{x}:0],[\mathtt{x}:1],[\mathtt{x}:2],[\mathtt{x}:3]\}\} \\ =\mathop{\mathsf{L}}\{\boldsymbol{\alpha}(\{[\mathtt{x}:0],[\mathtt{x}:1],[\mathtt{x}:2],[\mathtt{x}:3]\})\} \\ =\left[\mathtt{x}\cdot\mathtt{Int}\right] \end{array}$$

When an abstract value has a finite number of elements that are immediately below it in the abstract domain lattice, our sampling strategy selects samples from them recursively. Thus, in this example, *Sample*([x : Int]) becomes the union of *Sample*([x : Even]) and *Sample*([x : Odd]). We explain this recursive sampling strategy in Sect. 3. *Case* s-

<sup>4</sup> ≡ [x : Odd]*:*

$$\begin{array}{lcl} \text{strategy in Sect. 3.} &\\ \{\mathbf{x}: \mathtt{0dd}\}: &\\ \Downarrow\_{SRA}(c, \hat{s}\_{4}) = \bigsqcup \{\alpha(\{Run(c, s)\}) \quad | \quad s \in Sample(\hat{s}\_{4})\} \\ = \bigsqcup \{\alpha(\{Run(c, s)\}) \quad | \quad s \in \{ [\mathbf{x}: -1], [\mathbf{x}: 1] \} \} \\ = \bigsqcup \{\alpha(\{\mathtt{[x}: 1]\})\} \\ = [\mathtt{x}: 1] \end{array}$$

While ⇓*SRA* produces sound and precise results for the above three cases, it does not guarantee soundness; it may miss some behaviors of opaque code due to the limitations of the sampling strategy. Let us assume that *Sample*([x : Odd]) selects {[x : −1], [x : 1]} this time. Then, the model produces an unsound result [x : 1], which does not cover odd integers, because the selected values explore only partial behaviors of abs. When the number of possible states at a call site of opaque code is infinite, the sampling strategy can lead to unsound results. A welldesigned sampling strategy is crucial for our modeling approach; it affects the analysis performance and soundness significantly. The approach is precise thanks to under-approximated results from sampling, but entails a tradeoff between the analysis performance and soundness depending on the number of samples. In the next section, we propose a strategy to generate samples for various abstract domains and to control sample sizes effectively.

## **3 Combinatorial Sampling Strategy**

We propose to use a combinatorial sampling strategy (inspired by combinatorial testing) by the types of values that an abstract domain represents. The domains represent either *primitive* values like number and string, or *object* values like tuple, set, and map. Based on combinatorial testing, our strategy is recursively defined on the hierarchy of abstract domains used to represent program states. Assume that a,b ∈ A are abstract values that we want to concretize using *Sample*.

**Fig. 2.** The SAFE number domain for JavaScript

## **3.1 Abstract Domains for Primitive Values**

To explain our sampling strategy for primitive abstract domains, we use the DefaultNumber domain from SAFE as an example. DefaultNumber represents JavaScript numbers with subcategories as shown in Fig. 2. The subcategories are NaN (not a number), ±Inf (positive/negative infinity), UInt (unsigned integer), and NUInt (not an unsigned integer, which is a negative integer or a floating point number). *Case* |γ(-

a)| = constant*:*

Sample(a) = γ(a)

When a represents a finite number of concrete values, *Sample* simply takes all the values. For example, ±Inf has two possible values, +Inf and -Inf. Therefore, *Sample*(±Inf) = {+Inf, -Inf}. *Case* |γ(<sup>a</sup>)<sup>|</sup> <sup>=</sup> <sup>∞</sup> *and* |{b ∈ A- | ∀x- a. xa) =

 b -}| = *constant : Sample*(-*b Sample*(-

b)

When a represents an infinite number of concrete values, but it *covers* (that is, is immediately preceded by) a finite number of abstract values in the lattice, *Sample* applies to each predecessor recursively and merges the concrete results by set union. Note that, "y covers x" holds whenever x y and there is no z such that x z y. The number of samples increases linearly in this step. Number falls into this case. It represents infinitely many numbers, but it covers four abstract values in the lattice: NaN, ±Inf, UInt, and NUInt. *Case* |γ(<sup>a</sup>)<sup>|</sup> <sup>=</sup> <sup>∞</sup> *and* |{b ∈ A | ∀x- a. x-

 b -}| = ∞*: Sample*(<sup>a</sup>) = <sup>H</sup>(γ(-

$$Sample(\widehat{a}) = H(\gamma(\widehat{a}))$$

When a represents infinitely many concrete values and also covers infinitely many abstract values, we make the number of samples finite by applying a heuristic injection H of seed samples. For seed samples, we propose the following guidelines to manually select them:


In the DefaultNumber domain example, UInt and NUInt fall into this case. For the evaluation of our modeling approach in Sect. 5, we selected seed samples based on the guidelines as follows:

*Sample*(UInt) = {0, 1, 3, 10, 9999} *Sample*(NUInt) = {−10, −3, −1, −0.5, −0, 0.5, 3.14}

We experimentally show that this simple heuristic works well for automatic modeling of JavaScript builtin functions.

#### **3.2 Abstract Domains for Object Values**

Our sampling strategy for object abstract domains consists of four steps. To sample from a given abstract object a ∈ A-, we assume the following: – A concrete object <sup>a</sup> <sup>∈</sup> <sup>γ</sup>(-– Abstract domains for fields and values are <sup>F</sup>and V-


Then, the sampling strategy follows the next four steps:

1. Sampling fields

In order to construct sampled objects, it first samples a finite number of fields. JavaScript provides open objects, where fields can be added and removed dynamically, and fields can be referenced not only by string literals but also by arbitrary expressions of string values. Thus, this step collects fields from a finite set of fields that all possible objects should contain (F*must*) and samples from a possibly infinite set of fields that some possible objects may (but not must) contain (F*may* ): <sup>F</sup>*must* <sup>=</sup> mustF(-

$$\begin{array}{l} \text{)}: \\ F\_{must} = mustF(\hat{a}) \\ F\_{may} = Sample(mayF(\hat{a})) \; \bigvee F\_{must} \end{array}$$

2. Abstracting values for the sampled fields

For the fields in <sup>F</sup>*must* and <sup>F</sup>*may* sampled from the given abstract object a, it constructs two maps from fields to their abstract values, M*must* and M*may* , respectively, of type *Map*[F, <sup>V</sup>-]: <sup>M</sup>*must* <sup>=</sup> λf <sup>∈</sup> <sup>F</sup>*must*. α({a(f) <sup>|</sup> <sup>a</sup> <sup>∈</sup> <sup>γ</sup>(-

$$\begin{array}{l}\text{type } Map[F, V] \colon \\ M\_{must} = \lambda f \in F\_{must} \cdot \alpha(\{a(f) \mid a \in \gamma(\hat{a})\}) \\ M\_{may} = \lambda f \in F\_{may} \cdot \alpha(\{a(f) \mid a \in \gamma(\hat{a})\}) \end{array}$$

3. Sampling values

From M*must* and M*may* , it constructs another map M*<sup>s</sup>* : F → ℘(V-), where V- = V ∪ {-} denotes a set of values and the absence of a field -, by applying *Sample* to the value of each field in F*must* and F*may* . The value of each field in F*may* contains to denote that the field may not exist in M*s*:

$$M\_s = \lambda f \in F\_{must} \cup F\_{may}. \begin{cases} Sample(M\_{must}(f)) & \text{if } f \in F\_{must} \\ Sample(M\_{may}(f)) \cup \{\nexists\} & \text{if } f \in F\_{may} \end{cases}$$

#### 4. Choosing samples by combinatorial testing

Finally, since a number of all combinations from M*s*, *<sup>f</sup>*∈*Domain*(*Ms*) <sup>|</sup>M*s*(f)|, grows exponentially, the last step limits the number selections. We solve this selection problem by reducing it to a traditional testing problem with combinatorial testing [3]. Combinatorial testing is a well-studied problem and efficient algorithms for generating test cases exist. It addresses a similar problem to ours, increasing dynamic coverage of code under test, but in the context of finding bugs:

"The most common bugs in a program are generally triggered by either a single input parameter or an interaction between pairs of parameters."

Thus, we apply each-used or pair-wise testing (1 or 2-wise) as the last step.

Now, we demonstrate each step using an abstract array object a, whose length is greater than or equal to 2 and the elements of which are true or false. We write *<sup>b</sup>* to denote an abstract value such that γ(*<sup>b</sup>*) = {true, false}.

	- A concrete array object a is a map from indices to boolean values: *Map*[UInt, Boolean]. • For given abstract object <sup>a</sup>, *mustF*(<sup>a</sup>) = {0, <sup>1</sup>} and *mayF*(-
	- a) = UInt.
	- From Sect. 3.1, we sample {0, 1, 3, 10, 9999} for UInt.
	- k-*wise*(M) generates a set of minimum number of test cases satisfying all the requirements of k-*wise* testing for a map M. It constructs a test case by choosing one element from a set on each field.

$$\begin{array}{l} F\_{must} = \{0, 1\} \\ F\_{may} = Sample(\textbf{Unt}) \end{array}$$

– Step 2: Abstracting values for the sampled fields

$$\begin{array}{l} M\_{must} = [0 \mapsto \top\_b, 1 \mapsto \top\_b] \\ M\_{may} = [3 \mapsto \top\_b, 10 \mapsto \top\_b, 9999 \mapsto \top\_b] \end{array}$$

– Step 3: Sampling values

$$\begin{array}{c} M\_s = [ \\ \qquad \qquad \qquad 0 \mapsto \{ \mathtt{true}, \mathtt{false} \}, \quad 1 \mapsto \{ \mathtt{true}, \mathtt{false} \}, \\ \qquad \qquad \qquad 3 \mapsto \{ \mathtt{true}, \mathtt{false}, \mathtt{\#s}, \mathtt{\#t} \}, 1 \mapsto \{ \mathtt{true}, \mathtt{false}, \mathtt{\#s}, \mathtt{\#t} \}, \\ \qquad 9999 \mapsto \{ \mathtt{true}, \mathtt{false}, \mathtt{\#t} \} \end{array}$$

– Step 4: Choosing samples by combinatorial testing The number of all combinations *<sup>f</sup>*∈*Domain*(*Ms*) <sup>|</sup>M*s*(f)<sup>|</sup> is 108 even after sampling fields and values in an under-approximate manner. We can avoid such explosion of samples and manage well-distributed samples by using combinatorial testing. With each-used testing, three combinations can cover every element in a set on each field at least once:

1-wise(M*s*) = { [0 → true, 1 → false, 3 → true, 10 → -, 9999 → -], [0 → false, 1 → true, 3 → false, 10 → false, 9999 → true], [0 → false, 1 → true, 3 → -, 10 → true, 9999 → false] }

With pair-wise testing, 12 samples can cover every pair of elements from different sets at least once.

## **4 Implementation**

We implemented our automatic modeling approach for JavaScript because of its large number of builtin APIs and complex libraries, which are all opaque code for static analysis. They include the functions in the ECMAScript language standard [1] and web standards such as DOM and browser APIs. We implemented the modeling as an extension of SAFE [13,17], a JavaScript static analyzer. When the analyzer encounters calls of opaque code during analysis, it uses the SRA model of the code.

*Sample.* We applied the combinatorial sampling strategy for the SAFE abstract domains. Of the abstract domains for primitive JavaScript values, UInt, NUInt, and OtherStr represent an infinite number of concrete values (c.f. third case in Sect. 3.1) and thus require the use of heuristics. We describe the details of our heuristics and sample sets in Sect. 5.1.

We implemented the *Sample* step to use "each-used sample generation" for object abstract domains by default. In order to generate more samples, we added three options to apply pair-wise generation:


As an exception, we use the all-combination strategy for the DefaultDataProp domain representing a JavaScript property, consisting of a value and three booleans: writable, enumerable, and configurable. Note that *field* is used for language-independent objects and *property* is for JavaScript objects. The number of their combinations is limited to 2<sup>3</sup>. We consider a linear increase of samples as acceptable. The *Sample* step returns a finite set of concrete states, and each element in the set, which in turn contains concrete values only, is passed to the *Run* step.

*Run.* For each concrete input state, the *Run* step obtains a result state by executing the corresponding opaque code in four steps:

1. Generation of executable code

First, *Run* populates object values from the concrete state. We currently omit the JavaScript scope-chain information, because the library functions that we analyze as opaque code are independent from the scope of user code. It derives executable code to invoke the opaque code and adds argument values from the static analysis context.


After execution, the result state contains the objects from the input state, the return value of the opaque code, and all the values that it might refer to. Also, any mutation of objects of the input state as well as newly created objects are captured in this way. We use a snapshot module of SAFE to serialize the result state into a JSON-like format.

4. Transfer of the state to the analyzer The serialized snapshot is then passed to SAFE, where it is parsed, loaded, and combined with other results as a set of concrete result states.

*Abstract.* To abstract result states, we mostly used existing operations in SAFE, like lattice-*join*, and also implemented an over-approximation heuristic function, *Broaden*, comparable to widening. We use *Broaden* for property name sets in JavaScript objects, because *mayF* of a JavaScript abstract object can produce an abstract value that denotes an infinite set of concrete strings, and because ⇓*SRA* cannot produce such an abstract value from simple sampling and *join*. Thus, we regard all possibly absent properties as sampled properties. Then, we implemented the *Broaden* function merging all possibly absent properties into one abstract property representing any property, when the number of absent properties is greater than a certain threshold proportional to a number of sampled properties.

## **5 Evaluation**

We evaluated the ⇓*SRA* model in two regards, (1) the feasibility of replacing existing manual models (RQ1 and RQ2) and (2) the effects of our heuristic H on the analysis soundness (RQ3). The research questions are as follow:

– **RQ1: Analysis performance of** ⇓*SRA*

Can ⇓*SRA* replace existing manual models for program analysis with decent performance in terms of soundness, precision, and runtime overhead?

	- Is ⇓*SRA* broadly applicable to various builtin functions of JavaScript?

How much is the performance of ⇓*SRA* affected by the heuristics?

After describing the experimental setup for evaluation, we present our answers to the research questions with quantitative results, and discuss the limitations of our evaluation.

#### **5.1 Experimental Setup**

In order to evaluate the ⇓*SRA* model, we compared the analysis performance and applicability of ⇓*SRA* with those of the existing manual models in SAFE. We used two kinds of subjects: browser benchmark programs and builtin functions. From 34 browser benchmarks included in the test suite of SAFE, a subset of V8 Octane<sup>1</sup>, we collected 13 of them that invoke opaque code. Since browser benchmark programs use a small number of opaque functions, we also generated test cases for 134 functions in the ECMAScript 5.1 specification.

Each test case contains abstract values that represent two or more possible values. Because SAFE uses a finite number of abstract domains for primitive values, we used all of them in the test cases. We also generated 10 abstract objects. Five of them are manually created to represent arbitrary objects:

OBJ1 has an arbitrary property whose value is an arbitrary primitive.

OBJ2 is a property descriptor whose "value" is an arbitrary primitive, and the others are arbitrary booleans.

OBJ3 has an arbitrary property whose value is OBJ2.

OBJ4 is an empty array whose "length" is arbitrary.

OBJ5 is an arbitrary-length array with an arbitrary property

The other five objects were collected from SunSpider benchmark programs by using Jalangi2 [20] to represent frequently used abstract objects. We counted the number of function calls with object arguments and joined the most used object arguments in each program. Out of 10 programs that have function calls with object arguments, we discarded four programs that use the same objects for every function call, and one program that uses an argument with 2500 properties, which makes manual inspection impossible. We joined the first 10 concrete objects for each argument of the following benchmark to obtain abstract objects: 3d-cube.js, 3d-raytrace.js, access-binary-trees.js, regexp-dna.js, and string-fasta.js. For 134 test functions, when a test function consumes two or more arguments, we restricted each argument to have only an expected type to manage the number of test cases. Also, we used one or minimum number of arguments for functions with variable number of arguments.

In summary, we used 13 programs for RQ1, and 134 functions with 1565 test cases for RQ2 and RQ3. All experiments were on a 2.9 GHz quad-core Intel Core i7 with 16 GB memory machine.

<sup>1</sup> https://github.com/chromium/octane.

#### **5.2 Answers to Research Questions**

*Answer to RQ1.* We compared the precision, soundness, and analysis time of the SAFE manual models and the ⇓*SRA* model. Table 1 shows the precision and soundness for each opaque function call, and Table 2 presents the analysis time and number of samples for each program.

As for the precision, Table 1 shows that ⇓*SRA* produced more precise results than manual models for 9 (19.6%) cases. We manually checked whether each result of a model is sound or not by using the partial order function () implemented in SAFE. We found that all the results of the SAFE manual models for the benchmarks were sound. The ⇓*SRA* model produced an unsound result for only one function: Math.random. While it returns a floating-point value in the range [0, 1), ⇓*SRA* modeled it as NUInt, instead of the expected Number, because it missed 0.

As shown in Table 2, on average ⇓*SRA* took 1.35 times more analysis time than the SAFE models. The table also shows the number of context-sensitive opaque function calls during analysis (#Call), the maximum number of samples (#Max), and the total number of samples (#Total). To understand the runtime overhead better, we measured the proportion of elapsed time for each step. On average, *Sample* took 59%, *Run* 7%, *Abstract* 17%, and the rest 17%. The experimental results show that ⇓*SRA* provides high precision while slightly sacrificing soundness with modest runtime overhead.

*Answer to RQ2.* Because the benchmark programs use only 15 opaque functions as shown in Table 1, we generated abstracted arguments for 134 functions out of 169 functions in the ECMAScript 5.1 builtin library, for which SAFE has manual models. We semi-automatically checked the soundness and precision of the ⇓*SRA* model by comparing the analysis results with their expected results. Table 3 shows the results in terms of test cases (left half) and functions (right half). The **Equal** column shows the number of test cases or functions, for which both models provide equal results that are sound. The **SRA Pre.** column shows the number of such cases where the ⇓*SRA* model provides sound and more precise results than the manual model. The **Man. Uns.** column presents the number of such cases where ⇓*SRA* provides sound results but the manual one provides unsound results, and **SRA Uns.** shows the opposite case of **Man. Uns.** Finally, **Not Comp.** shows the number of cases where the results of ⇓*SRA* and the manual model are incomparable.

The ⇓*SRA* model produced sound results for 99.4% of test cases and 94.0% of functions. Moreover, ⇓*SRA* produced more precise results than the manual models for 33.7% of test cases and 50.0% of functions. Although ⇓*SRA* produced unsound results for 0.6% of test cases and 6.0% of functions, we found soundness bugs in the manual models using 1.3% of test cases and 7.5% of functions. Our experiments showed that the automatic ⇓*SRA* model produced less unsound results than the manual models. We reported the manual models producing unsound results to SAFE developers with the concrete examples that were generated in the *Run* step, which revealed the bugs.


**Table 1.** Precision and soundness by functions in the benchmarks

**Table 2.** Analysis time overhead by programs in the benchmarks


*Answer to RQ3.* The sampling strategy plays an important role in the performance of ⇓*SRA* especially for soundness. Our sampling strategy depends on two factors: (1) manually sampled sets via the heuristic H and (2) each-used or pair-wise selection for object samples. We used manually sampled sets for three abstract values: UInt, NUInt, and OtherStr. To sample concrete values from them, we used three methods: Base simply follows the guidelines described in Sect. 3.1, Random generates samples randomly, and Final denotes the heuristics determined by our trials and errors to reach the highest ratio of sound results. For object samples, we used three pair-wise options: HeapPair, ThisPair, and Arg-Pair. For various sampling configurations, Table 4 summarizes the ratio of sound


**Table 3.** Precision and soundness for the builtin functions

**Table 4.** Soundness and sampling cost for the builtin functions


results, the average and maximum numbers of samples for the test cases used in RQ2.

The table shows that Base and Random produced sound results for 85.0% and 84.9% (the worst case among 10 repetitions) of the test cases, respectively. Even without any sophisticated heuristics or pair-wise options, ⇓*SRA* achieved a decent amount of sound results. Using more samples collected by trials and errors with Final and all three pair-wise options, ⇓*SRA* generated sound results for 99.4% of the test cases by observing more behaviors of opaque code.

## **5.3 Limitations**

A fundamental limitation of our approach is that the ⇓*SRA* model may produce unsound results when the behavior of opaque code depends on values that ⇓*SRA* does not support via sampling. For example, if a sampling strategy calls the Date function without enough time intervals, it may not be able to sample different results. Similarly, if a sampling strategy does not use 4-wise combinations for property descriptor objects that have four components, it cannot produce all the possible combinations. However, at the same time, simply applying more complex strategies like 4-wise combinations may lead to an explosion of samples, which is not scalable.

Our experimental evaluation is inherently limited to a specific use case, which poses a threat to validity. While our approach itself is not dependent on a particular programming language or static analysis, the implementation of our approach depends on the abstract domains of SAFE. Although the experiments used wellknown benchmark programs as analysis subjects, they may not be representative of all common uses of opaque functions in JavaScript applications.

## **6 Related Work**

When a textual specification or documentation is available for opaque code, one can generate semantic models by mining them. Zhai *et al.* [26] showed that natural language processing can successfully generate models for Java library functions and used them in the context of taint analysis for Android applications. Researchers also created models automatically from types written in WebIDL or TypeScript declarations to detect Web API misuses [2,16].

Given an executable (e.g. binary) version of opaque code, researchers also synthesized code by sampling the inputs and outputs of the code [7,10,12,19]. Heule *et al.* [8] collected partial execution traces, which capture the effects of opaque code on user objects, followed by code synthesis to generate models from these traces. This approach works in the absence of any specification and has been demonstrated on array-manipulating builtins.

While all of these techniques are a-priori attempts to generate generalpurpose models of opaque code, to be usable for other analyses, researchers also proposed to construct models during analysis. Madsen *et al.*'s approach [14] infers models of opaque functions by combining pointer analysis and use analysis, which collects expected properties and their types from given application code. Hirzel *et al.* [9] proposed an online pointer analysis for Java, which handles native code and reflection via dynamic execution that ours also utilizes. While both approaches use only a finite set of pointers as their abstract values, ignoring primitive values, our technique generalizes such online approaches to be usable for all kinds of values in a given language.

Opaque code does matter in other program analyses as well such as model checking and symbolic execution. Shafiei and Breugel [22] proposed *jpf-nhandler*, an extension of Java PathFinder (JPF), which transfers execution between JPF and the host JVM by on-the-fly code generation. It does not need concretization and abstraction since a JPF object represents a concrete value. In the context of symbolic execution, concolic testing [21] and other hybrid techniques that combine path solving with random testing [18] have been used to overcome the problems posed by opaque code, albeit sacrificing completeness [4].

Even when source code of external libraries is available, substituting external code with models rather than analyzing themselves is useful to reduce time and memory that an analysis takes. Palepu *et al.* [15] generated summaries by abstracting concrete data dependencies of library functions observed on a training execution to avoid heavy execution of instrumented code. In model checking, Tkachuk *et al.* [24,25] generated over-approximated summaries of environments by points-to and side-effect analyses and presented a static analysis tool OCSEGen [23]. Another tool Modgen [5] applies a program slicing technique to reduce complexities of library classes.

## **7 Conclusion**

Creating semantic models for static analysis by hand is complex, time-consuming and error-prone. We present a Sample-Run-Abstract approach (⇓*SRA*) as a promising way to perform static analysis in the presence of opaque code using automated on-demand modeling. We show how ⇓*SRA* can be applied to the abstract domains of an existing JavaScript static analyzer, SAFE. For benchmark programs and 134 builtin functions with 1565 abstracted inputs, a tuned ⇓*SRA* produced more sound results than the manual models and concrete examples revealing bugs in the manual models. Although not all opaque code may be suitable for modeling with ⇓*SRA*, it reduces the amount of hand-written models a static analyzer should provide. Future work on ⇓*SRA* could focus on orthogonal testing techniques that can be used for sampling complex objects, and practical optimizations, such as caching of computed model results.

**Acknowledgment.** This work has received funding from National Research Foundation of Korea (NRF) (Grants NRF-2017R1A2B3012020 and 2017M3C4A7068177).

## **References**


60 J. Park et al.


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **SMT-Based Bounded Schedulability Analysis of the Clock Constraint Specification Language**

Min Zhang<sup>1</sup>, Fu Song2(B), Fr´ed´eric Mallet<sup>3</sup>, and Xiaohong Chen<sup>1</sup>

 Shanghai Key Laboratory of Trustworthy Computing, ECNU, Shanghai, China ShanghaiTech University, Shanghai, China songfu@shanghaitech.edu.cn Universit´e Cote d'Azur, CNRS, Inria, I3S, Nice, France

**Abstract.** The Clock Constraint Specification Language (CCSL) is a formalism for specifying logical-time constraints on events for the design of real-time embedded systems. A central verification problem of CCSL is to check whether events are schedulable under logical constraints. Although many efforts have been made addressing this problem, the problem is still open. In this paper, we show that the bounded scheduling problem is NP-complete and then propose an efficient SMT-based decision procedure which is sound and complete. Based on this decision procedure, we present a sound algorithm for the general scheduling problem. We implement our algorithm in a prototype tool and illustrate its utility in schedulability analysis in designing real-world systems and automatic proving of algebraic properties of CCSL constraints. Experimental results demonstrate its effectiveness and efficiency.

**Keywords:** SMT · CCSL · Schedulability · Logical time · Real-time system

## **1 Introduction**

Model-based design has been widely used, particularly in the design of safetycritical real-time embedded systems. It has achieved industrial successes through languages such as SCADE [12], AADL [15] and UML MARTE [26]. For example, UML MARTE provides syntactic annotations to implement, when the context allows, classical real-time scheduling algorithms such as EDF (Earliest Deadline First). It also provides a domain-specific language–Clock Constraint Specification Language (CCSL) [3], to express the real-time behaviors of a system under development as logical constraints on system events, but independently of any physical time and classical real-time scheduling algorithms. CCSL has been used on several industrial scenarios such as vehicle systems [16] and cyber-physical systems [10,22].

This work is supported by NSFC grants 61872146, 61532019 and 61761136011.

Model-based design usually starts with coarse-grained logical models that are progressively refined into more concrete ones until the final code deployment. It is well-known that the earlier one can detect and fix bugs in the refinement process, the better [7]. Therefore, it is critical to provide efficient methods and tools to check safety, liveness and schedulability on the logical models and not only on the definite deployed system. This has motivated a large body of works on verifying whether events are schedulable under a set of constraints expressed in CCSL [11,21,28,33,35,36,38], though its decidability is still open. These works first transform CCSL constraints into other formal representations such as transition systems [21], Promela [35], B¨uchi automata [36], timed automata [33], rewriting logics [38], instant relations [28], or timed-interval logics [11], and then apply existing tools. However, their approaches usually suffer from the state explosion problem. Moreover, most of these works only deal with the so-called safe subset of CCSL and the other ones only provide semi-algorithms. In our earlier work [39], we proposed an SMT-based verification approach to CCSL and demonstrated several applications of the approach to finding schedules, verifying temporal properties, proving constraint entailment, and analyzing the validity of system traces. Based on the approach, we implemented an efficient tool for verifying LTL properties of CCSL [40].

In this work we are focused on the scheduling problem of CCSL, a fundamental problem to which the aforementioned verification problems of CCSL can be reduced. We first prove that the *bounded* scheduling problem of CCSL with fixed bounds is NP-complete. To our knowledge, this is the first result regarding the complexity of the scheduling problem with CCSL. Then, we propose a decision procedure for the bounded scheduling problem with a given bound. The decision procedure is based on the transformation of CCSL into SMT formulas [39]. Our decision procedure is sound, complete, and efficient in practice. Based on this decision procedure, we turn to the general (i.e. unbounded) scheduling problem and present a binary-search based algorithm. Our algorithm is sound, i.e., if it proves either schedulable or unschedulable, then the result is conclusive. We implemented our algorithms in a prototype tool. The tool was used to analyze a real-world interlocking system in a rail transit system. Using the proposed approach, we also prove some algebraic properties of CCSL. The experimental results demonstrate the effectiveness and efficiency of the SMT-based approach.

The rest of this paper is organised as follows: Section 2 introduces CCSL. Section 3 defines the (bounded) scheduling problem of CCSL and shows that the bounded case is NP-complete. Section 4 presents an SMT-based decision procedure for the bounded scheduling problem and a sound algorithm for the general scheduling problem. Section 5 shows a case study and experimental results. Section 6 discusses related work, and Section 7 concludes the paper.

## **2 The Clock Constraint Specification Language**

#### **2.1 Logical Clock, History and Schedule**

In CCSL, clocks are used to model occurrences of events, where a clock ticks when the corresponding event occurs. For instance, a clock may represent an event that is dispatch of a task, communications between tasks or acquisition of a shared resource by a task. Constraints over clocks are used to specify causal and temporal relations between system events. No global physical time is presumed for the clocks and their constraints. This feature allows CCSL to define a polychronous specification of a system at a logical level.

**Definition 1 (Logical clock).** *A* (logical) clock c *is an infinite sequence of ticks* (c<sup>i</sup> )i∈N<sup>+</sup> *with each* <sup>c</sup><sup>i</sup> *being* tick *or* idle*, where* <sup>N</sup><sup>+</sup> *denotes the set of all the non-zero natural numbers.*

The value of c<sup>i</sup> denotes whether an event associated with c occurs or not at step i. If c<sup>i</sup> is *tick*, then the event occurs, otherwise not. In particular, we denote by **1** a global reference logical clock that always ticks at each step.

**Definition 2 (Schedule).** *Given a set* C *of clocks, a* schedule *of* C *is a total function* <sup>δ</sup> : <sup>N</sup><sup>+</sup> <sup>→</sup> <sup>2</sup><sup>C</sup> *such that* <sup>∀</sup><sup>i</sup> <sup>∈</sup> <sup>N</sup><sup>+</sup>*,* <sup>δ</sup>(i) = {<sup>c</sup> <sup>∈</sup> <sup>C</sup> <sup>|</sup> <sup>c</sup><sup>i</sup> <sup>=</sup> *tick*} *and* <sup>δ</sup>(i) <sup>=</sup> <sup>∅</sup>*.*

Intuitively, a schedule δ defines a partial order between the ticks of the clocks. δ(i) is a subset of C such that c ∈ δ(i) iff c ticks at step i. The condition δ(i) = ∅ expresses that step i cannot be empty. This forbids stuttering steps in schedules. As one can add or remove finite number of empty steps without effect on schedulability, we exclude them from schedules for succinctness.

A clock can memorize the number of ticks that it has made. We use *history* to represent the memorization.

**Definition 3 (History).** *Given a schedule* δ *for a set* C *of clocks, a* history *of* <sup>δ</sup> *is a function* <sup>χ</sup><sup>δ</sup> : <sup>C</sup> <sup>×</sup> <sup>N</sup><sup>+</sup> <sup>→</sup> <sup>N</sup> *such that for each* <sup>c</sup> <sup>∈</sup> <sup>C</sup> *and* <sup>i</sup> <sup>∈</sup> <sup>N</sup><sup>+</sup>*:*

$$\chi\_{\delta}(c,i) = \begin{cases} 0, & \text{if } i = 1; \\ \chi\_{\delta}(c, i - 1), & \text{if } i > 1 \land c \notin \delta(i - 1); \\ \chi\_{\delta}(c, i - 1) + 1, & \text{if } i > 1 \land c \in \delta(i - 1). \end{cases}$$

χδ(c, i) represents the number of the ticks that the clock c has made immediately before step i. (Note that the tick of c at step i is excluded in χδ(c, i).) For simplicity, we may write χ for χ<sup>δ</sup> if it is clear from the context.

#### **2.2 Syntax and Semantics of CCSL**

CCSL consists of 11 kinds of constraints, 4 of them are binary relations for specifying the *precedence*, *causality*, *subclocking*, and *exclusion* relations between clocks, and the others are used to define clocks from existing ones. Clocks defined by constraints may correspond to system events or are just introduced as auxiliary clocks without corresponding to any events.


**Table 1.** Semantics of CCSL with respect to schedules

**Definition 4 (Syntax).** *A* CCSL *constraint* φ *is defined by the following form:*


*where* b ≥ 0*,* d ≥ 0 *and* p > 0 *are natural numbers,* c1*,* c2*,* c<sup>3</sup> *are logical clocks and* w *is a (possibly infinite) word over* {0, 1} *expressed as a (*ω*-)regular expression.*

For simplifying presentation, we denote by c<sup>1</sup> ≺ c<sup>2</sup> the constraint c<sup>1</sup> [0]≺ c2, and c<sup>1</sup> c<sup>2</sup> \$ d the constraint c<sup>1</sup> c<sup>2</sup> \$ d on c<sup>3</sup> such that c<sup>2</sup> = c3.

The semantics of CCSL constraints is defined over schedules. Given a CCSL constraint φ and a schedule δ, the satisfiability relation δ |= φ (i.e., δ satisfies constraint φ) is defined in Table 1.

The precedence constraint c<sup>1</sup> ≺ c<sup>2</sup> (i.e., c<sup>1</sup> [0]≺ c2) expresses that the clock c<sup>1</sup> precedes the clock c2. Suppose there is an unbounded buffer with two operations *fetch* and *store*, which respectively fetch data from and store data into the buffer. Fetch is only allowed when the buffer is nonempty. If the buffer is initially empty, store operation must strictly precede fetch operation. This behavior can be expressed by the constraint: *store* ≺ *fetch*. Likewise, the precedence constraint can be used to represent reentrant tasks by replacing *store* with *start* and *fetch* with *finish*.

The general precedence constraint c<sup>1</sup> [b]≺ c<sup>2</sup> that can specify the differences b between the number of occurrences of two clocks before the precedence takes effect. Hence, it is able to express more complicated relations. For instance, if the buffer initially is nonempty, fetch operations can be performed prior to any store operation. Figure 1 shows such a scenario where 4 elements are initially presented in the buffer. This behavior can be represented as: *store* [4]≺ *fetch*.

The causality, subclock and exclusion constraints are straightforward. The causality constraint c<sup>1</sup> c<sup>2</sup> specifies that the occurrence of c<sup>2</sup> must be caused by the occurrence of c1, namely at any moment c<sup>1</sup> must have ticked at least as many times as

**Fig. 1.** Example for *store* [4]<sup>≺</sup> *fetch*

c<sup>2</sup> has. The subclock constraint c<sup>1</sup> ⊆ c<sup>2</sup> expresses that c<sup>1</sup> occurs at some step only if c<sup>2</sup> occur at this step as well. The exclusion constraint c<sup>1</sup> # c<sup>2</sup> specifies that two clocks c<sup>1</sup> and c<sup>2</sup> are exclusive, i.e., they cannot occur simultaneously at the same step.

The union and intersection constraints are used to define clocks. c<sup>1</sup> c<sup>2</sup> + c<sup>3</sup> defines a clock c<sup>1</sup> such that c<sup>1</sup> ticks iff c<sup>2</sup> or c<sup>3</sup> ticks. Similarly, c<sup>1</sup> c<sup>2</sup> ∗ c<sup>3</sup> defines a clock c<sup>1</sup> such that c<sup>1</sup> ticks iff both c<sup>2</sup> and c<sup>3</sup> tick. The infimum (resp. supremum) constraint c<sup>1</sup> c<sup>2</sup> ∧ c<sup>3</sup> (resp. c<sup>1</sup> c<sup>2</sup> ∨ c3) is used to define a clock c<sup>1</sup> that is the slowest (resp. fastest) clock that is faster (resp. slower) than both c<sup>2</sup> and c3. These two constraints are useful for expressing delay requirements between two events. Remark that clocks c<sup>1</sup> defined by constraints may correspond to system events, otherwise are auxiliary clocks. In the former case, these constraints can be seen as constraints specifying relations between clocks c1, c<sup>2</sup> and c3.

The periodicity constraint c<sup>1</sup> c<sup>2</sup> ∝ p defines a clock c<sup>1</sup> such that c<sup>1</sup> has to be performed once every p occurrences of clock c2. It is worth mentioning that the periodicity constraint defined in such a way is relative because of the logical nature of CCSL clocks. That is, clock c<sup>1</sup> is relatively periodic with respect to clock c2. CCSL does not assume the existence of a global reference clock, most relations are defined relative to other clocks. These notions extend the equivalent behaviors which are usually defined relative to physical time. If c<sup>2</sup> represents a sensor that measures physical time, then c<sup>1</sup> becomes physically periodic.

The filtering constraint c<sup>1</sup> c<sup>2</sup> w is used to define a clock c<sup>1</sup> which can be seen as snapshots of the clock c<sup>2</sup> at some steps according to the (ω-)regular expression w. For instance, c<sup>1</sup> c<sup>2</sup> (01)<sup>ω</sup> expresses that c<sup>1</sup> simulates c<sup>2</sup> at every even step. It defines a logically periodic behavior of c<sup>1</sup> with respect to c2.

The delayFor constraint c<sup>1</sup> c<sup>2</sup> \$ d (i.e., c<sup>1</sup> c<sup>2</sup> \$ d on c2) defines a new clock c<sup>1</sup> that is delayed by the clock c<sup>2</sup> with d steps. The general form c<sup>1</sup> c<sup>2</sup> \$ d on c<sup>3</sup> defines a new clock c<sup>1</sup> that is delayed by c<sup>2</sup> with d times of the ticks of c3. c<sup>1</sup> can be seen as a *sampled* clock of c<sup>2</sup> on the basis of c3. For instance, c<sup>1</sup> c<sup>2</sup> \$ 1 on c3, denotes that whenever c<sup>2</sup> ticks at least once between two successive ticks of c<sup>3</sup> at steps m and n, c<sup>1</sup> must tick at step n.

## **3 Scheduling Problem of CCSL**

#### **3.1 Schedulability**

Given a set Φ of CCSL constraints, a schedule δ satisfies Φ, denoted by δ |= Φ, iff δ |= φ for all constraints φ ∈ Φ.

**Fig. 2.** The unique schedule that satisfies the three constraints in the example

**Definition 5 (Logical time scheduling problem).** *Given a set* Φ *of* CCSL *constraints,* the (logical time) scheduling problem *of* CCSL *is to determine whether there exists a schedule* δ *such that* δ |= Φ*.*

We illustrate the scheduling problem by a simple example. Consider alternative flickering between the green and red light using CCSL. We assume that green light starts first. The timing requirements can be formalized by the following three constraints:

$$tagreen \prec red, \qquad temp \triangleq green \ \\$\ 1, \qquad red \prec temp,$$

where green and red are clocks respectively representing whether the green (resp. red) light is turned on, the clock tmp is an auxiliary clock used to help specify the constraints on clocks.

There exists exactly one schedule satisfying the three constraints, as shown in Fig. 2. In this schedule, the clock *tmp* has the same behavior as *green* from step 2, while the clock *red* has the opposite behavior to *green*. Namely, *red* and *green* operates in an alternative manner. For simplicity, we also write *green* ∼ *red* to denote the *alternation* relation of the two clocks.

Although one may be able to find one or more schedules for some simple constraints, to our knowledge, there is no generally applicable decision procedure solving the scheduling problem of full CCSL. There are two main challenges. First, schedules are essentially *infinite*, i.e., defined on all the natural numbers. Second, the *precedence* is *stateful*, i.e., it depends on the history, and there is no upper bound on how far in the history one must go back. It may then require an infinite memory to store the history. As a first step to tackle this challenging problem, in this work, we first consider the *bounded* scheduling problem.

#### **3.2 Bounded Scheduling Problem**

Given a bound <sup>k</sup> <sup>∈</sup> <sup>N</sup><sup>+</sup>, let <sup>σ</sup> : <sup>N</sup><sup>+</sup> <sup>≤</sup><sup>k</sup> <sup>→</sup> <sup>2</sup><sup>C</sup> be a function. <sup>σ</sup> is an <sup>k</sup>*-bounded schedule* of a set Φ of CCSL constraints, denoted by σ |=<sup>k</sup> Φ, iff there exists a schedule <sup>δ</sup> such that <sup>δ</sup>(i) = <sup>σ</sup>(i) for every <sup>i</sup> <sup>∈</sup> <sup>N</sup><sup>+</sup> <sup>≤</sup><sup>k</sup> and <sup>δ</sup> <sup>|</sup><sup>=</sup> <sup>Φ</sup> from step 1 up to k, where N<sup>+</sup> <sup>≤</sup><sup>k</sup> := {1, ··· , k}.

**Definition 6 (Bounded scheduling problem).** *The* bounded scheduling problem *is to determine, for a given set* Φ *of* CCSL *constraints and a bound* k*, whether there is an* k*-bounded schedule* σ *for* Φ*, i.e.,* σ |=<sup>k</sup> Φ*.*

**Theorem 1 (Sufficient condition of unschedulability).** *If a set* Φ *of constraints has no* <sup>k</sup>*-bounded schedule for some* <sup>k</sup> <sup>∈</sup> <sup>N</sup><sup>+</sup>*, then* <sup>Φ</sup> *is unschedulable.*

The proof is straightforward by contradiction.

It is easy to see that the bounded scheduling problem is decidable, as there are finitely many potential <sup>k</sup>-bounded schedules, i.e., (2|C<sup>|</sup> <sup>−</sup> 1)<sup>k</sup>, where <sup>|</sup>C<sup>|</sup> denotes the number of clocks. Furthermore, the satisfiability problem of Boolean formulas can be reduced to the bounded scheduling problem in polynomial time.

**Theorem 2.** *The* k*-bounded scheduling problem of* CCSL *is* NP*-complete, even if* k = 1*.*

*Proof.* The NP upper bound can be proved easily based on the facts that the number of possible k-bounded schedules is finite and the universal quantification <sup>∀</sup><sup>n</sup> <sup>∈</sup> <sup>N</sup><sup>+</sup> <sup>≤</sup><sup>k</sup> can be eliminated by enumerating all the possible values in <sup>N</sup><sup>+</sup> ≤k.

We prove the NP-hardness by a reduction from the satisfiability problem of Boolean formulas which is known NP-complete. Consider the Boolean formula φ = <sup>m</sup> <sup>i</sup>=1(l 1 <sup>i</sup> ∨ l 2 <sup>i</sup> ∨ l 3 <sup>i</sup> ), where <sup>m</sup> <sup>∈</sup> <sup>N</sup><sup>+</sup> and <sup>l</sup> j <sup>i</sup> for j ∈ {1, 2, 3} is either a Boolean variable x or its negation ¬x. Let Var(φ) denote the set of Boolean variables appearing in φ. We construct a set of CCSL constraints Φ as follows.

For each <sup>x</sup> <sup>∈</sup> Var(φ), we have two clocks <sup>x</sup><sup>+</sup> and <sup>x</sup>−. Let enc(x) = <sup>x</sup><sup>+</sup> and enc(¬x) = x−. Each clause l 1 <sup>i</sup> ∨l 2 <sup>i</sup> ∨l 3 <sup>i</sup> in φ is encoded as the CCSL constraint c<sup>i</sup> enc(l 1 <sup>i</sup> )+enc(l 2 <sup>i</sup> )+enc(l 3 <sup>i</sup> ), denoted by ψi. Note that c<sup>i</sup> enc(l 1 <sup>i</sup> )+enc(l 2 <sup>i</sup> )+enc(l 3 i ) can be transformed into CCSL constraints by introducing one auxiliary clock c, i.e., {c<sup>i</sup> enc(l 1 <sup>i</sup> ) + enc(l 2 <sup>i</sup> ) + enc(l 3 <sup>i</sup> )}≡{c<sup>i</sup> enc(l 1 <sup>i</sup> ) + c, c enc(l 2 <sup>i</sup> ) + enc(l 3 <sup>i</sup> )}.

Let enc(φ) denote the following set of CCSL constraints

$$\{\mathbf{1} \stackrel{\Delta}{=} \*\_{i=1}^m c\_i, \psi\_1, \dots, \psi\_m, x^+ \; \#\; x^-, \mathbf{1} \stackrel{\Delta}{=} x^+ + x^- \; | \; x \in \mathsf{Var}(\phi)\}$$

where x<sup>+</sup> # x<sup>−</sup> and **1** x<sup>+</sup> + x<sup>−</sup> enforce that either x<sup>+</sup> or x<sup>−</sup> ticks at each step, but not both. This encodes that either x is true or ¬x is true. Note that <sup>τ</sup> <sup>∗</sup><sup>m</sup> <sup>i</sup>=1c<sup>i</sup> is a shorthand of τ c<sup>1</sup> ∗··· ∗ cm, and can also be expressed in CCSL constraints by introducing polynomial number of auxiliary clocks. For instance, {c c<sup>1</sup> ∗ c<sup>2</sup> ∗ c3}≡{c c<sup>1</sup> ∗ c , c c<sup>2</sup> ∗ c3}. We can show that φ is satisfiable iff enc(φ) is 1-bounded schedulable. The satisfiability problem of Boolean formulas is NP-complete, we get that the 1-bounded scheduling problem of CCSL is NP-hard. The k-bounded scheduling problem for k > 1 immediately follows by repeating the ticks of clocks at the first step.

Theorem 2 indicates the time complexity of the bounded scheduling problem. Thus, we need to find practical solutions that are algorithmically efficient for it. In the next section, we propose an SMT-based decision procedure for the bounded scheduling problem and a sound algorithm for the scheduling problem. Thanks to advances in state-of-the-art SMT solvers such as Z3 [25], our approach is usually efficient in practice.

## **4 Decision Procedure for the Scheduling Problem**

#### **4.1 Transformation from CCSL into SMT**

Let us fix a set of CCSL constraints Φ defined over a set C of clocks. Each clock <sup>c</sup> <sup>∈</sup> <sup>C</sup> is interpreted as a predicate <sup>t</sup><sup>c</sup> : <sup>N</sup><sup>+</sup> <sup>→</sup> Bool such that for all <sup>i</sup> <sup>∈</sup> <sup>N</sup><sup>+</sup>, <sup>t</sup>c(i) is true iff the clock c ticks at i, where Bool denotes Boolean sort. A schedule δ of Φ is encoded as a set of predicates T<sup>C</sup> = {tc|c ∈ C} such that the following condition holds: for all t<sup>c</sup> ∈ T<sup>C</sup> ,

$$
\forall i \in \mathbb{N}^+. t\_c(i) \Leftrightarrow c \in \delta(i).
$$

Recalling that schedules forbid stuttering steps, this condition is enforced by restricting the predicates t<sup>c</sup> in T<sup>C</sup> to satisfy the following condition:

$$\forall i \in \mathbb{N}^+. \lor\_{c \in C} t\_c(i) \tag{F1}$$

Formula F1 specifies that at each step i at least one clock c ticks, i.e., tc(i) holds.

For each clock <sup>c</sup> <sup>∈</sup> <sup>C</sup>, we introduce an auxiliary function <sup>h</sup><sup>c</sup> : <sup>N</sup><sup>+</sup> <sup>→</sup> <sup>N</sup> to encode its history. For each <sup>i</sup> <sup>∈</sup> <sup>N</sup><sup>+</sup>,

$$h\_c(i) := \begin{cases} 0, & \text{if } i = 1; \\ h\_c(i-1), & \text{if } i > 1 \land \neg t\_c(i-1); \\ h\_c(i-1) + 1, & \text{if } i > 1 \land t\_c(i-1). \end{cases} \tag{F2}$$

Intuitively, <sup>h</sup>c(i) is equivalent to <sup>χ</sup>(c, i) for each <sup>i</sup> <sup>∈</sup> <sup>N</sup><sup>+</sup>. The set of all the auxiliary functions is denoted by H<sup>C</sup> .

By replacing each occurrence of clock c in δ(n) (resp. c ∈ δ(n)) with tc(n) (resp. ¬tc(n)) and χ(c, n) with hc(n) in the definition of each CCSL constraint, each CCSL constraint φ can be encoded as an SMT formula φ.

We use -Φ to denote the conjunction of Formulas F1, F2 and the SMT encodings of CCSL constraints in Φ. Formally,

$$\lceil \Phi \rceil := \mathbf{F1} \land \mathbf{F2} \land (\land\_{\phi \in \Phi} \lceil \phi \rceil )\dots$$

Finding a schedule for Φ amounts to finding a solution, i.e., definitions of predicates in T<sup>C</sup> , which satisfies -Φ.

#### **Proposition 1.** Φ *has a schedule iff* -Φ *is satisfiable.*

The scheduling problem of Φ is transformed into the satisfiability problem of the formula -Φ. However, according to the SMT-LIB standard [4], -Φ belongs to the logic of UFLIA (formulas with Uninterpreted Functions and Linear Integer Arithmetic), whose satisfiability problem is undecidable in general. Nevertheless, the SMT encoding is still useful to solve the bounded scheduling problem, which we will present in the next subsection.

#### **4.2 Decision Procedure for the Bounded Scheduling Problem**

For k-bounded scheduling problem, it suffices to consider schedules δ : N<sup>+</sup> <sup>≤</sup><sup>k</sup> → 2<sup>C</sup> . Moreover, the quantifiers in -Φ can be eliminated once the bound k is fixed. Hence, we can resort to state-of-the-art SMT solvers. Formally, let -Φ<sup>k</sup> be the formula obtained from -Φ = F1 ∧ F2 ∧ ( <sup>φ</sup>∈Φφ) by


**Proposition 2.** Φ *is* k*-bounded schedulable iff* -Φ<sup>k</sup> *is satisfiable. Moreover, if* -Φ<sup>k</sup> *is satisfiable, then* -Φ<sup>k</sup>*is satisfiable for all* k ≤ k*.*

#### **4.3 A Sound Algorithm for the Scheduling Problem**

According to Theorem 1, Propositions 1 and 2, (1) if -Φ is satisfiable, then Φ is schedulable, and (2) if -<sup>Φ</sup><sup>k</sup> for some <sup>k</sup> <sup>∈</sup> <sup>N</sup><sup>+</sup> is unsatisfiable, then <sup>Φ</sup> is unschedulable. We can deduce a sound algorithm for checking the general scheduling problem. However, randomly choosing a bound k and checking whether or not -Φ<sup>k</sup> is unsatisfiable may be inefficient, as the k-bounded scheduling problem is NP-hard (cf. Theorem 2), and larger bound k may result in time out, but smaller bound k may result in that -Φ<sup>k</sup> is satisfiable. Indeed, if we consider the maximal bound B, then the random approach may have to call SMT solving **O**(B) times. Alternatively, we propose a binary-search based approach as shown in Algorithm 1 for a given maximal bound B, which invokes SMT solving at most **O**(| log<sup>2</sup> B|) times.

**Algorithm 1:** A sound algorithm for the scheduling problem **Input** : a set of constraints <sup>Φ</sup>, a timeout threshold <sup>T</sup>, a maximal bound <sup>B</sup> **Output**: {SAT, UNSAT, Timeout} × <sup>N</sup><sup>+</sup> **<sup>1</sup>** result<sup>1</sup> <sup>←</sup> SMTSolver(-Φ, T); **<sup>2</sup> if** result<sup>1</sup> <sup>=</sup> SAT **then** /\* Schedulable \*/ **<sup>3</sup> return** (SAT, <sup>0</sup>) **<sup>4</sup>** l ← 0; u ← B; **<sup>5</sup> while** <sup>l</sup> <sup>≤</sup> <sup>u</sup> **do** /\* Binary search \*/ **<sup>6</sup>** <sup>k</sup> ← *<sup>l</sup>*+*<sup>u</sup>* <sup>2</sup> ; **<sup>7</sup>** result<sup>2</sup> <sup>←</sup> SMTSolver(-Φ*k*, T); **<sup>8</sup> if** result<sup>2</sup> <sup>=</sup> SAT **then** <sup>l</sup> <sup>←</sup> <sup>k</sup> + 1; /\* Upper half \*/ **<sup>9</sup> else** /\* Lower half \*/ **<sup>10</sup>** u ← k − 1; **<sup>11</sup> if** result<sup>1</sup> <sup>=</sup> UNSAT <sup>∨</sup> result<sup>2</sup> <sup>=</sup> UNSAT **then <sup>12</sup>** result<sup>1</sup> <sup>←</sup> UNSAT; **<sup>13</sup> if** result<sup>2</sup> <sup>=</sup> SAT **then** <sup>k</sup> <sup>←</sup> <sup>k</sup> <sup>−</sup> 1; **<sup>14</sup> return** (result1, k);

Given a set Φ of constraints in CCSL, a timeout threshold T and a maximal bound B, Algorithm 1 first invokes an SMTSolver to decide whether -Φ is satisfiable or not within T time. If -Φ is satisfiable, then Algorithm 1 returns (SAT,0), meaning that Φ is schedulable. Otherwise, it binary searches a bound k ≤ B such that -Φ<sup>k</sup> is satisfiable while -Φk+1 (if k + 1 ≤ B) is unsatisfiable or cannot be verified in time T.

**Theorem 3.** *Algorithm 1 has the following three properties:*


## **5 Case Study and Performance Evaluation**

We implemented our approach in a prototype tool with Z3 [25] as its underlying SMT solver. We conduct a case study on expressing requirements of an interlocking system in CCSL constraints and analyzing its schedulability. Then, we prove 12 algebraic properties of CCSL constraints using the tool. Finally, we evaluate the performance of the tool using 9 sets of CCSL constraints.

## **5.1 Schedulability of an Interlocking System**

The interlocking system is a subsystem of a rail transit system. It is used to prevent trains from collisions and derailments when they are moving under the control of signal lights. As shown in Fig. 3, the interlocking system monitors

the occupancy status of the individual track section, and sends signals to inform drivers whether they are allowed to enter the route or not. The railway tracks are divided into sections. Each section is associated with a track circuit for detecting whether it is occupied by a train or not. Signal lights are placed between track sections. They can be red and green to indicate proceeding and stopping, respectively.

**Fig. 3.** Interlocking system

The mechanism and operation procedure of the interlocking system are summarized as follows.



**Table 2.** CCSL constraints of the interlocking system


There are time constraints on the above operations. For instance, the control center needs to get a response from the track circuit within 30 ms after sending an inquiry to it. The train must make decision within 50 ms after it sends a request to the control center. The light should turn to the corresponding color within 30 ms after it receives a pulse. After the track becomes occupied (*resp.* unoccupied), the light must turn red (*resp.* green) within 40 ms.

Table 2 shows the main logical constraints on the operations in the system and their timing constraints. We use some non-standard constraint expressions for the sake of compactness. Constraint a − b ≤ n denotes that b must tick within n steps after a ticks. It equals the set of the following three constraints:

$$a \prec b, \quad t \triangleq a \not\le n \text{ on } \mathbf{1}, \quad b \prec t.$$

Note that in this example the unit of time is millisecond (ms). Thus, there is an implicit assumption in the constraints that every tick of a logic clock means the elapse of one millisecond.

**Fig. 4.** A bounded schedule for the CCSL constraints in the case study

Most constraints in Table 2 are straightforward, except the six constraints marked with waved underlines. The first three constraints specify that checkFail only can occur between the occurrences of getUnoccupied and getOccupied. The others specify the following two requirements:


Given these constraints, our tool found a bounded schedule as depicted in Fig. 4. From step 1 to step 7, one complete process is finished. Initially, the track gets unoccupied. At step 2, a request is made, which causes subsequent operations to occur from step 3 to step 7. At step 29, a fail case occurs because another train enters (step 26) but has not left (step 31). The train that made the request has to wait (step 33).

If we extend the bounded schedule by infinitely repeating the behaviors of all the clocks between step 51 and 69 from step 70, we obtain an infinite schedule. The extended schedule satisfies all the constraints, and thus it is a witness of the schedulability of designed mechanism for the interlocking system.

In this paper, we are only concerned with the schedulability of the constraints in the example. Some other kinds of temporal properties also need to verify. For instance, we must guarantee that whenever a train requests to enter the station, it must eventually enter. We also need to verify the system is deadlock-free. Such temporal properties can be verified by LTL model checking of CCSL constraints using SMT technique [40]. We omit it because it is beyond the scope of this paper.

#### **5.2 Automatic Proof of CCSL Algebraic Properties**

Using the proposed approach, we can also prove automatically algebraic properties of CCSL constraints such as the commutativity of exclusion and transitivity of causality. Algebraic properties of CCSL constraints can be represented as Φ ⇒ φ, where Φ is a set of CCSL constraints and φ is a constraint derived from Φ. Proving Φ ⇒ φ is valid equals proving the unsatisfiability of -Φ∧ ¬φ, which can be solved by Algorithm 1.


**Table 3.** Proved algebraic properties of CCSL constraints

Let us consider the proof of the slowestness of infimum as an example. The slowestness of infimum means that an infimum constraint c<sup>1</sup> c<sup>2</sup> ∧ c<sup>3</sup> defines the slowest clock c<sup>1</sup> among those that are faster than both c<sup>2</sup> and c3.

**Proposition 3 (Slowestness of infimum).** *Given two clocks* c2, c3*, let* c<sup>1</sup> c<sup>2</sup> ∧ c<sup>3</sup> *and* c<sup>4</sup> *be an arbitrary clock such that* c<sup>4</sup> c<sup>2</sup> *and* c<sup>4</sup> c3*, then* c<sup>4</sup> c1*.*

This is proved by transforming CCSL constraints into the following SMT formula according the SMT encoding method:

$$\{c\_1 \triangleq c\_2 \land c\_3\} \land \{c\_4 \prec c\_2\} \land \{c\_4 \prec c\_3\} \land \neg\{c\_4 \prec c\_1\}.$$

Algorithm 1 returns (UNSAT, 0), which means that the formula is proved unsatisfiable. The proposition is proved.

Table 3 lists the algebraic properties that have been successfully proved in our approach. Algebraic properties are useful to help understand the relation among CCSL constraints. Using them we can also verify whether some CCSL constraints are redundant or inconsistent for a given set of CCSL constraints.

#### **5.3 Performance Evaluation**

To evaluate the performance our tool, we collected 9 sets of CCSL constraints from the literature and real-world applications, and analyzed their schedulability using our tool. Under different time thresholds, we calculate the maximal bounds under which the constraints are schedulable.

Table 4 shows all the experimental results including the corresponding execution time. All the experiments were conducted on a Win 10 running on an i7 CPU with 2.70 GHz and 16 GB memory. The numbers followed by asterisks


**Table 4.** Experimental results of bounded schedulability analysis

*Remarks:* CS: constraint set, Cons: the number of constraints, Clks: the number of clocks, THD: timeout threshold, TM: Time (second), BD: upper bound.

are the maximal bounds such that the corresponding constraints are bounded schedulable, but unschedulable in the next step. It is interesting to observe from Table 4 that time cost is loosely related to size (the number of clocks and constraints), thanks to efficient search strategies of SMT solvers. This is in striking contrasts to automata-based [29,35] and the rewriting-based approaches [38], whose scalability suffers from both the numbers of clocks and constraints.

## **6 Related Work**

CCSL is directly derived from the family of synchronous languages, such as Lustre [9], Esterel [6] and Signal [5], and its the scheduling problem of CCSL is akin to what synchronous languages call clock calculus. The main differences are: CCSL is a specification language, while others are programming languages; and CCSL partially describes what is expected to happen in a declarative way and does not give a direct operational deterministic description of what must happen. Furthermore, CCSL only deals with pure clocks while the others deal with signals and extract the clocks when needed.

The Esterel compiler [31] applies a constructive approach to decide when a signal must occur (compute its clock) and what its value should be. This requires a detection of *causality cycles*, or intra-cycle data dependencies, which are also naturally addressed by our approach. However, the Esterel compiler compiles an imperative program into a Boolean circuit, or equivalently a finite state machine. Consequently, it cannot deal with CCSL unbounded schedules.

The clock calculus in Signal attempts to detect whether the specification is endochronous [30], in which case it can generate some efficient code. This analysis is mainly based on the subclock relationship that also exists in CCSL. In CCSL, we consider the problem whether there is at least one possible schedule or not.

In Lustre and its extensions, clocks are regarded as abstract types [13] and the clock calculus computes the relative rates of clocks while rejecting the program when computing the rates is not possible. In most cases, the compiler attempts to build bounded buffers and to ensure that the functional determinism can be preserved with a finite memory. In our case, we do not seek to reach a finite representation, as in the first specification steps this is not a primary goal for the designers. Indeed, this might lead to an over-specification of the problem.

Classical real-time scheduling problem [32] usually relies on task models, arrival patterns and constraints (e.g., precedence, resources) to propose algorithms for the scheduling problem with analytical results [19] or heuristics depending on the specific model (e.g., priorities, preemptive). Other solutions, based on timed automata [1,2,17] or timed Petri nets [8,18], propose a general framework for describing all the relevant aspects without assuming a specific task model. CCSL offers an alternative method based on logical time. It is believed that logical time and multiform time bases offer some flexibility to unify functional requirements and performance constraints. We rely on CCSL and we claim that after encoding a task model in CCSL, finding a schedule for the CCSL model also gives a schedule for the encoded task model [24].

There have been many efforts made towards the scheduling problem of CCSL, though no conclusion is drawn on its decidability. TimeSquare [14] is a simulation tool for CCSL which can produce a possible schedule for a given set of CCSL, up to a given user-defined bound. It also supports different simulation strategies for producing desired execution traces. Some earlier work [20] define the notion of *safe* CCSL specifications that can be encoded with a finitestate machine. The scheduling problem is decidable for safe specifications, as one can merely enumerate all the (finite) solutions. A semi-algorithm can build the finite representation when the specification is safe [21]. In [37], Zhang et al. proposed a state-based approach and a sufficient condition to decide whether safe and unsafe specifications accept a so-called *periodic schedule* [39]. This allows to build a finite solution for unsafe specifications, while there may also exist infinite solutions. Xu et al. proposed a notion of *divergence* of CCSL to study the schedulability of CCSL, and proved that a set of CCSL constraints is schedulable if all the constraints are divergent [34]. They resorted to the theorem prover PVS [27] to assist the divergence proof.

The scheduling problem of CCSL constraints in this work resorts to SMT solving to deal with the bounded and unbounded schedules. Using SMT solving has two advantages: (1) it is usually efficient in practice, and (2) it can deal with unsafe CCSL constraints such as infimum and supremum [21].

Some basic algebraic properties on CCSL relations have been established manually before [23] but we provide here an automatic framework to do so.

## **7 Conclusion and Future Work**

In this work, we proved that the bounded scheduling problem of CCSL is NP-complete, and proposed an SMT-based decision procedure for the bounded scheduling problem. The procedure is sound and complete. The experimental results also show its efficiency in practice. Based on this decision procedure, we devised a sound algorithm for the general scheduling problem. We evaluated the effectiveness of the proposed approach on an interlocking system. We also showed our approach can be used to prove algebraic properties of CCSL constraints.

Our approach to the bounded scheduling problem of CCSL makes us one step closer to tackling the general (i.e. unbounded) scheduling problem. As the case study demonstrates, one may find an infinite schedule by extending a bounded one such that the extended infinite schedule still satisfies the constraints. This observation inspires future work to investigate mechanisms of finding such bounded schedules, hopefully with SMT solvers by extending our algorithm. In our earlier work [37], we proposed a similar approach to search for periodical schedules in bounded steps. In that approach, CCSL constraints are transformed into finite state machine and consequently suffers from the state explosion problem. We believe our SMT-based approach can be extended to their work while still avoiding state explosion. We leave it to future work.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **A Hybrid Dynamic Logic for Event/Data-Based Systems**

Rolf Hennicker<sup>1</sup>, Alexandre Madeira2,3(B), and Alexander Knapp<sup>4</sup>

<sup>1</sup> Ludwig-Maximilians-Universit¨at M¨unchen, Munich, Germany hennicke@pst.ifi.lmu.de <sup>2</sup> CIDMA, University of Aveiro, Aveiro, Portugal madeira@ua.pt <sup>3</sup> QuantaLab, University of Minho, Braga, Portugal <sup>4</sup> Universit¨at Augsburg, Augsburg, Germany knapp@informatik.uni-augsburg.de

**Abstract.** We propose *<sup>E</sup>*<sup>↓</sup>-logic as a formal foundation for the specification and development of event-based systems with local data states. The logic is intended to cover a broad range of abstraction levels from abstract requirements specifications up to constructive specifications. Our logic uses diamond and box modalities over structured actions adopted from dynamic logic. Atomic actions are pairs e ψ where e is an event and ψ a state transition predicate capturing the allowed reactions to the event. To write concrete specifications of recursive process structures we integrate (control) state variables and binders of hybrid logic. The semantic interpretation relies on event/data transition systems; specification refinement is defined by model class inclusion. For the presentation of constructive specifications we propose operational event/data specifications allowing for familiar, diagrammatic representations by state transition graphs. We show that *E*<sup>↓</sup>-logic is powerful enough to characterise the semantics of an operational specification by a single *E*<sup>↓</sup>-sentence. Thus the whole development process can rely on *E*<sup>↓</sup>-logic and its semantics as a common basis. This includes also a variety of implementation constructors to support, among others, event refinement and parallel composition.

## **1 Introduction**

Event-based systems are an important kind of software systems which are open to the environment to react to certain events. A crucial characteristics of such systems is that not any event can (or should) be expected at any time. Hence the control flow of the system is significant and should be modelled by appropriate means. On the other hand components administrate data which may change upon the occurrence of an event. Thus also the specification of admissible data changes caused by events plays a major role.

A. Madeira—Supported by ERDF through COMPETE 2020 and by National Funds through FCT with POCI-01-0145-FEDER-016692 and UID/MAT/04106/2019, in a contract foreseen in nos. 4–6 of art. 23 of the DL 57/2016, changed by DL 57/2017.

There is quite a lot of literature on modelling and specification of event-based systems. Many approaches, often underpinned by graphical notations, provide formalisms aiming at being constructive enough to suggest particular designs or implementations, like e.g., Event-B [1,7], symbolic transition systems [17], and UML behavioural and protocol state machines [12,16]. On the other hand, there are logical formalisms to express desired properties of event-based systems. Among them are temporal logics integrating state and event-based styles [4], and various kinds of modal logics involving data, like first-order dynamic logic [10] or the modal μ-calculus with data and time [9]. The gap between logics and constructive specification is usually filled by checking whether *the* model of a constructive specification satisfies certain logical formulae.

In this paper we are interested in investigating a logic which is capable to express properties of event/data-based systems on various abstraction levels in a common formalism. For this purpose we follow ideas of [15], but there data states, effects of events on them and constructive operational specifications (see below) were not considered. The advantage of an expressive logic is that we can split the transition from system requirements to system implementation into a series of gradual refinement steps which are more easy to understand, to verify, and to adjust when certain aspects of the system are to be changed or when a product line of similar products has to be developed.

To that end we propose E<sup>↓</sup>-logic, a dynamic logic enriched with features of hybrid logic. The dynamic part uses diamond and box modalities over structured actions. Atomic actions are of the form eψ with e an event and ψ a state transition predicate specifying the admissible effects of e on the data. Using sequential composition, union, and iteration we obtain complex actions that, in connection with the modalities, can be used to specify required and forbidden behaviour. In particular, if E is a finite set of events, though data is infinite we are able to capture all reachable states of the system and to express safety and liveness properties. But E<sup>↓</sup>-logic is also powerful enough to specify concrete, recursive process structures by integrating state variables and binders from hybrid logic [6] with the subtle difference that our state variables are used to denote control states only. We show that the dynamic part of the logic is bisimulation invariant while the hybrid part, due to the ability to bind names to states, is not.

An axiomatic specification *Sp* = (Σ, *Ax* ) in E<sup>↓</sup> is given by an event/data signature Σ = (E,A), with a set E of events and a set A of attributes to model local data states, and a set of E<sup>↓</sup>-sentences *Ax* , called axioms, expressing requirements. For the semantic interpretation we use event/data transition systems (edts). Their states are reachable configurations γ = (c, ω) where c is a control state, recording the current state of execution, and ω is a local data state, i.e., a valuation of the attributes. Transitions between configurations are labelled by events. The semantics of a specification *Sp* is "loose" in the sense that it consists of *all* edts satisfying the axioms of the specification. Such structures are called models of *Sp*. Loose semantics allows us to define a simple refinement notion: *Sp*<sup>1</sup> refines to *Sp*<sup>2</sup> if the model class of *Sp*<sup>2</sup> is included in the model class of *Sp*1. We may also say that *Sp*<sup>2</sup> is an implementation of *Sp*1.

Our refinement process starts typically with axiomatic specifications whose axioms involve only the dynamic part of the logic. Hybrid features will successively be added in refinements when specifying more concrete behaviours, like loops. Aiming at a concrete design, the use of an axiomatic specification style may, however, become cumbersome since we have to state explicitly also all negative cases, what the system should not do. For a convenient presentation of constructive specifications we propose operational event/data specifications, which are a kind of symbolic transition systems equipped again with a model class semantics in terms of edts. We will show that E<sup>↓</sup>-logic, by use of the hybrid binder, is powerful enough to characterise the semantics of an operational specification. Therefore we have not really left E<sup>↓</sup>-logic when refining axiomatic by operational specifications. Moreover, since several constructive notations in the literature, including (essential parts of) Event-B, symbolic transition systems, and UML protocol state machines, can be expressed as operational specifications, E<sup>↓</sup>-logic provides a logical umbrella under which event/data-based systems can be developed.

In order to consider more complex refinements we take up an idea of Sannella and Tarlecki [18,19] who have proposed the notion of constructor implementation. This is a generic notion applicable to specification formalisms based on signatures and semantic structures for signatures. As both are available in the context of E<sup>↓</sup>-logic, we complement our approach by introducing a couple of constructors, among them event refinement and parallel composition. For the latter we provide a useful refinement criterion relying on a relationship between syntactic and semantic parallel composition. The logic and the use of the implementation constructors will be illustrated by a running example.

Hereafter, in Sect. 2, we introduce syntax and semantics of E<sup>↓</sup>-logic. In Sect. 3, we consider axiomatic as well as operational specifications and demonstrate the expressiveness of E<sup>↓</sup>-logic. Refinement of both types of specifications using several implementation constructors is considered in Sect. 4. Section 5 provides some concluding remarks. Proofs of theorems and facts can be found in [11].

## **2 A Hybrid Dynamic Logic for Event/Data Systems**

We propose the logic E<sup>↓</sup> to specify and reason about event/data-based systems. E<sup>↓</sup>-logic is an extension of the hybrid dynamic logic considered in [15] by taking into account changing data. Therefore, we first summarise our underlying notions used for the treatment of data. We then introduce the syntax and semantics of E<sup>↓</sup> with its hybrid and dynamic logic features applied to events and data.

#### **2.1 Data States**

We assume given a universe D of *data values*. A *data signature* is given by a set A of *attributes*. An A-*data state* ω is a function ω : A → D. We denote by Ω(A) the set of all A-data states. For any data signature A, we assume given a set Φ(A) of *state predicates* to be interpreted over single A-data states, and a set

Ψ(A) of *transition predicates* to be interpreted over pairs of pre- and post-A-data states. The concrete syntax of state and transition predicates is of no particular importance for the following. For an attribute a ∈ A, a state predicate may be a > 0; and a transition predicate e.g. a = a + 1, where a refers to the value of attribute a in the pre-data state and a to its value in the post-data state. Still, both types of predicates are assumed to contain true and to be closed under negation (written ¬) and disjunction (written ∨); as usual, we will then also use false, ∧, etc. Furthermore, we assume for each A<sup>0</sup> ⊆ A a transition predicate id<sup>A</sup><sup>0</sup> ∈ Ψ(A) expressing that the values of attributes in A<sup>0</sup> are the same in preand post-A-data states.

We write ω |=<sup>D</sup> <sup>A</sup> ϕ if ϕ ∈ Φ(A) is satisfied in data state ω; and (ω1, ω2) |=<sup>D</sup> <sup>A</sup> ψ if ψ ∈ Ψ(A) is satisfied in the pre-data state ω<sup>1</sup> and post-data state ω2. In particular, (ω1, ω2) |=<sup>D</sup> <sup>A</sup> id<sup>A</sup><sup>0</sup> if, and only if, ω1(a0) = ω2(a0) for all a<sup>0</sup> ∈ A0.

## **2.2** *E<sup>↓</sup>***-Logic**

**Definition 1.** *An* event/data signature *(* ed signature*, for short)* Σ = (E,A) *consists of a finite set of* events E *and a data signature* A*. We write* E(Σ) *for* E *and* A(Σ) *for* A*. We also write* Ω(Σ) *for* Ω(A(Σ))*,* Φ(Σ) *for* Φ(A(Σ))*, and* Ψ(Σ) *for* Ψ(A(Σ))*. The class of ed signatures is denoted by Sig*E<sup>↓</sup> *.*

Any ed signature Σ determines a class of semantic structures, the *event/data transition systems* which are reachable transition systems with sets of initial states and events as labels on transitions. The states are pairs γ = (c, ω), called *configurations*, where c is a *control state* recording the current execution state and ω is an A(Σ)-data state; we write c(γ) for c and ω(γ) for ω.

**Definition 2.** *A* Σ*-*event/data transition system *(*Σ*-*edts*, for short)* M = (Γ, R, Γ0) *over an ed signature* Σ *consists of a set of* configurations Γ ⊆ C × Ω(Σ) *for a set of* control states C*; a family of* transition relations R = (R<sup>e</sup> ⊆ Γ × Γ)<sup>e</sup>∈E(Σ)*; and a non-empty set of* initial configurations Γ<sup>0</sup> ⊆ {c0} × Ω<sup>0</sup> *for an* initial control state c<sup>0</sup> ∈ C *and a set of* initial data states Ω<sup>0</sup> ⊆ Ω(Σ) *such that* Γ *is* reachable *via* R*, i.e., for all* γ ∈ Γ *there are* γ<sup>0</sup> ∈ Γ0*,* n ≥ 0*,* e1,...,e<sup>n</sup> ∈ E(Σ)*, and* (γi, γi+1) ∈ R<sup>e</sup>i+1 *for all* 0 ≤ i<n *with* γ<sup>n</sup> = γ*. We write* Γ(M) *for* Γ*,* C(M) *for* C*,* R(M) *for* R*,* c0(M) *for* c0*,* Ω0(M) *for* Ω0*, and* Γ0(M) *for* Γ0*. The class of* Σ*-edts is denoted by Edts*E<sup>↓</sup> (Σ)*.*

Atomic actions are given by expressions of the form eψ with e an event and ψ a state transition predicate. The intuition is that the occurrence of the event e causes a state transition in accordance with ψ, i.e., the pre- and post-data states satisfy ψ, and ψ specifies the possible effects of e. Following the ideas of dynamic logic we also use complex, structured actions formed over atomic actions by union, sequential composition and iteration. All kinds of actions over an ed signature Σ are called Σ-*event/data actions* (Σ-*ed actions*, for short). The set Λ(Σ) of Σ-ed actions is defined by the grammar

$$\lambda ::= e \!\!\!\!/ \psi \; | \; \lambda\_1 + \lambda\_2 \; | \; \lambda\_1; \lambda\_2 \; | \; \lambda^\* $$

where e ∈ E(Σ) and ψ ∈ Ψ(Σ). We use the following shorthand notations for actions: For a subset F = {e1,...,ek} ⊆ E(Σ), we use the notation F to denote the complex action e1 true + ... + ek true and −F to denote the action E(Σ) \ F. For the action E(Σ) we will write *E*. For e ∈ E(Σ), we use the notation e to denote the action e true and −e to denote the action *E* \ {e}. Hence, if E(Σ) = {e1,...,en} and e<sup>i</sup> ∈ E(Σ), the action −e<sup>i</sup> stands for e1 true + ... + ei−<sup>1</sup> true + ei+1 true + ... + entrue.

The actions Λ(Σ) are *interpreted* over a Σ-edts M as the family of relations (R(M)<sup>λ</sup> ⊆ Γ(M) × Γ(M))<sup>λ</sup>∈Λ(Σ) defined by


To define the event/data formulae of E<sup>↓</sup> we assume given a countably infinite set X of control state variables which are used in formulae to denote the control part of a configuration. They can be bound by the binder operator ↓x and accessed by the jump operator @x of hybrid logic. The dynamic part of our logic is due to the modalities which can be formed over any ed action over a given ed signature. E<sup>↓</sup> thus retains from hybrid logic the use of binders, but omits free nominals. Thus sentences of the logic become restricted to express properties of configurations reachable from the initial ones.

**Definition 3.** *The set* FrmE<sup>↓</sup> (Σ) *of* Σ*-*ed formulae *over an ed signature* Σ *is given by*

::= ϕ | x | ↓x. | @x. | λ | true | ¬ | <sup>1</sup> ∨ <sup>2</sup>

*where* ϕ ∈ Φ(Σ)*,* x ∈ X*, and* λ ∈ Λ(Σ)*. We write* [λ] *for* ¬λ¬ *and we use the usual boolean connectives as well as the constant* false *to denote* ¬true*.* 1 *The set* SenE<sup>↓</sup> (Σ) *of* Σ*-*ed sentences *consists of all* Σ*-ed formulae without free variables, where the free variables are defined as usual with* ↓x *being the unique operator binding variables.*

Given an ed signature Σ and a Σ-edts M, the satisfaction of a Σ-ed formula is inductively defined w.r.t. valuations v : X → C(M), mapping variables to control states, and configurations γ ∈ Γ(M):

$$-\ M, v, \gamma \doteq^{\mathcal{E}^\*}\_{\Sigma} \varphi \text{ iff } \omega(\gamma) \models^{\mathcal{D}}\_{A(\Sigma)} \varphi;$$


<sup>1</sup> We use true and false for predicates and formulae; their meaning will always be clear from the context. For boolean values we will use instead the notations tt and ff .

– M, v, γ <sup>|</sup>=E<sup>↓</sup> <sup>Σ</sup> true always holds; – M, v, γ <sup>|</sup>=E<sup>↓</sup> <sup>Σ</sup> <sup>¬</sup> iff M, v, γ |=E<sup>↓</sup> <sup>Σ</sup> ; – M, v, γ <sup>|</sup>=E<sup>↓</sup> <sup>Σ</sup> <sup>1</sup> <sup>∨</sup> <sup>2</sup> iff M, v, γ <sup>|</sup>=E<sup>↓</sup> <sup>Σ</sup> <sup>1</sup> or M, v, γ <sup>|</sup>=E<sup>↓</sup> <sup>Σ</sup> <sup>2</sup>.

If is a sentence then the valuation is irrelevant. M *satisfies* a sentence ∈ SenE<sup>↓</sup> (Σ), denoted by <sup>M</sup> <sup>|</sup>=E<sup>↓</sup> <sup>Σ</sup> , if M, γ<sup>0</sup> <sup>|</sup>=E<sup>↓</sup> <sup>Σ</sup> for all γ<sup>0</sup> ∈ Γ0(M).

By borrowing the modalities from dynamic logic [9,10], E<sup>↓</sup> is able to express liveness and safety requirements as illustrated in our running ATM example below. There we use the fact that we can state properties over all reachable states by sentences of the form [*E*<sup>∗</sup>]ϕ. In particular, deadlock-freedom can be expressed by [*E*<sup>∗</sup>]*E*true. The logic E<sup>↓</sup>, however, is also suited to directly express process structures and, thus, the implementation of abstract requirements. The binder operator is essential for this. For example, we can specify a process which switches a boolean value, denoted by the attribute val, from *tt* to *ff* and back by the following sentence:

$$\downarrow x\_0 . \mathsf{val} = tt \wedge \langle \mathsf{switch} \rangle \!/ \mathsf{val}' = \! f \rangle \langle \mathsf{switch} \rangle \!/ \mathsf{val}' = tt \rangle x\_0.$$

#### **2.3 Bisimulation and Invariance**

Bisimulation is a crucial notion in both behavioural systems specification and in modal logics. On the specification side, it provides a standard way to identify systems with the same behaviour by abstracting the internal specifics of the systems; this is also reflected at the logic side, where bisimulation frequently relates states that satisfy the same formulae. We explore some properties of E<sup>↓</sup> w.r.t. bisimilarity. Let us first introduce the notion of bisimilarity in the context of E<sup>↓</sup>:

**Definition 4.** *Let* M1, M<sup>2</sup> *be* Σ*-edts. A relation* B ⊆ Γ(M1) × Γ(M2) *is a* bisimulation relation *between* M<sup>1</sup> *and* M<sup>2</sup> *if for all* (γ1, γ2) ∈ B *the following conditions hold:*

*(atom) for all* ϕ ∈ Φ(Σ)*,* ω(γ1) |=<sup>D</sup> <sup>A</sup>(Σ) ϕ *iff* ω(γ2) |=<sup>D</sup> <sup>A</sup>(Σ) ϕ*;*

*(zig) for all* e ψ ∈ Λ(Σ) *and for all* γ <sup>1</sup> ∈ Γ(M1) *with* (γ1, γ <sup>1</sup>) ∈ R(M1)eψ*, there is a* γ <sup>2</sup> ∈ Γ(M2) *such that* (γ2, γ <sup>2</sup>) ∈ R(M2)e<sup>ψ</sup> *and* (γ 1, γ <sup>2</sup>) ∈ B*;*

*(zag) for all* e ψ ∈ Λ(Σ) *and for all* γ <sup>2</sup> ∈ Γ(M2) *with* (γ2, γ <sup>2</sup>) ∈ R(M2)eψ*, there is a* γ <sup>1</sup> ∈ Γ(M1) *such that* (γ1, γ <sup>1</sup>) ∈ R(M1)e<sup>ψ</sup> *and* (γ 1, γ <sup>2</sup>) ∈ B*.*

M<sup>1</sup> *and* M<sup>2</sup> *are* bisimilar*, in symbols* M<sup>1</sup> ∼ M2*, if there exists a bisimulation relation* B ⊆ Γ(M1) × Γ(M2) *between* M<sup>1</sup> *and* M<sup>2</sup> *such that*

*(init) for any* γ<sup>1</sup> ∈ Γ0(M1)*, there is a* γ<sup>2</sup> ∈ Γ0(M2) *such that* (γ1, γ2) ∈ B *and for any* γ<sup>2</sup> ∈ Γ0(M2)*, there is a* γ<sup>1</sup> ∈ Γ0(M1) *such that* (γ1, γ2) ∈ B*.*

Now we are able to establish a Hennessy-Milner like correspondence for a fragment of E<sup>↓</sup>. Let us call *hybrid-free sentences of* E<sup>↓</sup> the formulae obtained by the grammar

$$
\varrho ::= \varphi \mid \langle \lambda \rangle \varrho \mid \text{true} \mid \neg \varrho \mid \varrho\_1 \vee \varrho\_2 \dots
$$

**Theorem 1.** *Let* <sup>M</sup>1, M<sup>2</sup> *be bisimilar* <sup>Σ</sup>*-edts. Then* <sup>M</sup><sup>1</sup> <sup>|</sup>=E<sup>↓</sup> <sup>Σ</sup> *iff* <sup>M</sup><sup>2</sup> <sup>|</sup>=E<sup>↓</sup> <sup>Σ</sup> *for all hybrid-free sentences .*

The converse of Theorem 1 does not hold, in general, and the usual imagefiniteness assumption has to be imposed: A Σ-edts M is *image-finite* if, for all γ ∈ Γ(M) and all e ∈ E(Σ), the set {γ | (γ, γ ) ∈ R(M)e} is finite. Then:

**Theorem 2.** *Let* M1, M<sup>2</sup> *be image-finite* Σ*-edts and* γ<sup>1</sup> ∈ Γ(M1)*,* γ<sup>2</sup> ∈ Γ(M2) *such that* <sup>M</sup>1, γ<sup>1</sup> <sup>|</sup>=E<sup>↓</sup> <sup>Σ</sup> *iff* <sup>M</sup>2, γ<sup>2</sup> <sup>|</sup>=E<sup>↓</sup> <sup>Σ</sup> *for all hybrid-free sentences . Then there exists a bisimulation* B *between* M<sup>1</sup> *and* M<sup>2</sup> *such that* (γ1, γ2) ∈ B*.*

## **3 Specifications of Event/Data Systems**

#### **3.1 Axiomatic Specifications**

Sentences of E<sup>↓</sup>-logic can be used to specify properties of event/data systems and thus to write system specifications in an axiomatic way.

**Definition 5.** *An* axiomatic ed specification *Sp* = (Σ(*Sp*), *Ax* (*Sp*)) *in* E<sup>↓</sup> *consists of an ed signature* <sup>Σ</sup>(*Sp*) <sup>∈</sup> *Sig*E<sup>↓</sup> *and a set of* axioms *Ax* (*Sp*) ⊆ SenE<sup>↓</sup> (Σ(*Sp*))*.*

*The* semantics of *Sp is given by the pair* (Σ(*Sp*), Mod(*Sp*)) *where* Mod(*Sp*) = {<sup>M</sup> <sup>∈</sup> *Edts*E<sup>↓</sup> (Σ(*Sp*)) <sup>|</sup> <sup>M</sup> <sup>|</sup>=E<sup>↓</sup> <sup>Σ</sup>(*Sp*) *Ax* (*Sp*)}*. The edts in* Mod(*Sp*) *are called* models *of Sp and* Mod(*Sp*) *is the* model class *of Sp.*

As a direct consequence of Theorem 1 we have:

**Corollary 1.** *The model class of an axiomatic ed specification exclusively expressed by hybrid-free sentences is closed under bisimulation.*

This result does not hold for sentences with hybrid features. For instance, consider the specification *Sp* = - ({e}, {a}), {↓x .e a = ax} : An edts with a single control state c<sup>0</sup> and a loop transition R<sup>e</sup> = {(γ0, γ0)} for c(γ0) = c<sup>0</sup> is a model of *Sp*. However, this is obviously not the case for its bisimilar edts with two control states c<sup>0</sup> and c and the relation R <sup>e</sup> = {(γ0, γ),(γ, γ0)} with c(γ0) = c0, c(γ) = c and ω(γ0) = ω(γ).

*Example 1.* As a running example we consider an ATM. We start with an abstract specification *Sp*<sup>0</sup> of fundamental requirements for its interaction behaviour based on the set of events E<sup>0</sup> = {insertCard, enterPIN, ejectCard, cancel}<sup>2</sup> and on the singleton set of attributes <sup>A</sup><sup>0</sup> <sup>=</sup> {chk} where chk is boolean valued and records the correctness of an entered PIN. Hence our first ed signature is Σ<sup>0</sup> = (E0, A0) and *Sp*<sup>0</sup> = (Σ0, *Ax* <sup>0</sup>) where *Ax* <sup>0</sup> requires the following properties expressed by corresponding axioms (0.1–0.3):

<sup>2</sup> For shortening the presentation we omit further events like withdrawing money, etc.

– "Whenever a card has been inserted, a correct PIN can eventually be entered and also the transaction can eventually be cancelled."

[*E*∗; insertCard](*E*∗; enterPINchk = *tt*true ∧ *E*∗; canceltrue) (0.1)

– "Whenever either a correct PIN has been entered or the transaction has been cancelled, the card can eventually be ejected."

$$[\mathbf{E}^\*; (\texttt{enterPlN} \texttt{/chk'} = tt) + \texttt{cancel}] \langle \mathbf{E}^\*; \texttt{ejectCard} \rangle \text{true} \tag{0.2}$$

– "Whenever an incorrect PIN has been entered three times in a row, the current card is not returned." This means that the card is kept by the ATM which is not modelled by an extra event. It may, however, still be possible that another card is inserted afterwards. So an ejectCard can only be forbidden as long as no next card is inserted.

$$(\mathbf{E}^\*; (\mathbf{enterPlN} \% \text{chk}' = \mathcal{J} \text{)}^3; (-\text{insetCard})^\*; \text{ejectCard} \text{[false} \qquad (0.3)$$

where λ<sup>n</sup> abbreviates the n-fold sequential composition λ; ... ; λ.

The semantics of an axiomatic ed specification is loose allowing usually for many different realisations. A refinement step is therefore understood as a restriction of the model class of an abstract specification. Following the terminology of Sannella and Tarlecki [18,19], we call a specification refining another one an *implementation*. Formally, a specification *Sp* is a *simple implementation* of a specification *Sp* over the same signature, in symbols *Sp* - *Sp* , whenever Mod(*Sp*) ⊇ Mod(*Sp* ). Transitivity of the inclusion relation ensures gradual step-by-step development by a series of refinements.

*Example 2.* We provide a refinement *Sp*<sup>0</sup> - *Sp*<sup>1</sup> where *Sp*<sup>1</sup> = (Σ0, *Ax* <sup>1</sup>) has the same signature as *Sp*<sup>0</sup> and *Ax* <sup>1</sup> are the sentences (1.1–1.4) below; the last two use binders to specify a loop. As is easily seen, all models of *Sp*<sup>1</sup> must satisfy the axioms of *Sp*0.

– "At the beginning a card can be inserted with the effect that chk is set to *ff* ; nothing else is possible at the beginning."

$$\begin{array}{l} \langle \mathsf{insetCard} \rangle \ulcorner \mathsf{chk}' = \mathcal{G} \rangle \ulcorner \mathsf{true} \land \\ \langle \mathsf{insetCard} \rangle \ulcorner \neg (\mathsf{chk}' = \mathcal{G} \rangle \ulcorner \| \mathsf{false} \land [- \mathsf{insetCard}] \text{false} \end{array} \tag{1.1}$$

– "Whenever a card has been inserted, a PIN can be entered (directly afterwards) and also the transaction can be cancelled; but nothing else."

$$\begin{aligned} [\mathbf{E}^\*; \texttt{insert\texttt{Card}}](\texttt{center\texttt{PlN}}) &\texttt{true} \land \texttt{(cancer)}\texttt{true} \land \\ [-\{\texttt{center\texttt{PlN}}, \texttt{cancer}\}] &\texttt{false} \end{aligned} \tag{1.2}$$

– "Whenever either a correct PIN has been entered or the transaction has been cancelled, the card can eventually be ejected and the ATM starts from the beginning."

$$\downarrow x\_0 \, . \left[ E^\* ; \left( \texttt{enterPlN} \, \middle| \, \texttt{chk'} = tt \right) + \texttt{cancel} \right] \langle E^\* ; \texttt{ejectCard} \rangle \, x\_0 \right. \tag{1.3}$$

– "Whenever an incorrect PIN has been entered three times in a row the ATM starts from the beginning." Hence the current card is kept.

$$\times x\_0 \, . [E^\* ; (\text{enter} \text{Pll} \% \text{chk}' = \text{ff})^3] x\_0 \tag{1.4}$$

#### **3.2 Operational Specifications**

Operational event/data specifications are introduced as a means to specify in a more constructive style the properties of event/data systems. They are not appropriate for writing abstract requirements for which axiomatic specifications are recommended. Though E<sup>↓</sup>-logic is able to specify concrete models, as discussed in Sect. 2, the use of operational specifications allows a graphic representation close to familiar formalisms in the literature, like UML protocol state machines, cf. [12,16]. As will be shown in Sect. 3.3, finite operational specifications can be characterised by a sentence in E<sup>↓</sup>-logic. Therefore, E<sup>↓</sup>-logic is still the common basis of our development approach. Transitions in an operational specification are tuples (c, ϕ, e, ψ, c ) with c a source control state, ϕ a precondition, e an event, ψ a state transition predicate specifying the possible effects of the event e, and c a target control state. In the semantic models an event must be enabled whenever the respective source data state satisfies the precondition. Thus isolating preconditions has a semantic consequence that is not expressible by transition predicates only. The effect of the event must respect ψ; no other transitions are allowed.

**Definition 6.** *An* operational ed specification O = (Σ, C, T,(c0, ϕ0)) *is given by an ed signature* Σ*, a set of* control states C*, a* transition relation specification T ⊆ C ×Φ(Σ)×E(Σ)×Ψ(Σ)×C*, an* initial control state c<sup>0</sup> ∈ C*, and an* initial state predicate ϕ<sup>0</sup> ∈ Φ(Σ)*, such that* C *is* syntactically reachable*, i.e., for every* c ∈ C \{c0} *there are* (c0, ϕ1, e1, ψ1, c1),...,(c<sup>n</sup>−<sup>1</sup>, ϕn, en, ψn, cn) ∈ T *with* n > 0 *such that* c<sup>n</sup> = c*. We write* Σ(O) *for* Σ*, etc.*

*A* Σ*-edts* M *is a* model *of* O *if* C(M) = C *up to a bijective renaming,* c0(M) = c0*,* Ω0(M) ⊆ {ω | ω |=<sup>D</sup> <sup>A</sup>(Σ) ϕ0}*, and if the following conditions hold:*

*– for all* (c, ϕ, e, ψ, c ) ∈ T *and* ω ∈ Ω(A(Σ)) *with* ω |=<sup>D</sup> <sup>A</sup>(Σ) ϕ*, there is a* ((c, ω), (c , ω )) ∈ R(M)<sup>e</sup> *with* (ω, ω ) |=<sup>D</sup> <sup>A</sup>(Σ) ψ*;*

**Fig. 1.** Operational ed specification ATM

*– for all* ((c, ω),(c , ω )) ∈ R(M)<sup>e</sup> *there is a* (c, ϕ, e, ψ, c ) ∈ T *with* ω |=<sup>D</sup> <sup>A</sup>(Σ) ϕ *and* (ω, ω ) |=<sup>D</sup> <sup>A</sup>(Σ) ψ*.*

*The class of all models of* O *is denoted by* Mod(O)*. The* semantics *of* O *is given by the pair* (Σ(O), Mod(O)) *where* Σ(O) = Σ*.*

*Example 3.* We construct an operational ed specification, called *ATM* , for the ATM example. The signature of *ATM* extends the one of *Sp*<sup>1</sup> (and *Sp*0) by an additional integer-valued attribute trls which counts the number of attempts to enter a correct PIN (with the same card). *ATM* is graphically presented in Fig. 1. The initial control state is *Card*, and the initial state predicate is true. Preconditions are written before the symbol →. If no precondition is explicitly indicated it is assumed to be true. Due to the extended signature, *ATM* is not a simple implementation of *Sp*1, and we will only formally justify the implementation relationship in Example 5.

Operational specifications can be composed by a syntactic parallel composition operator which synchronises shared events. Two ed signatures Σ<sup>1</sup> and Σ<sup>2</sup> are *composable* if A(Σ1) ∩ A(Σ2) = ∅. Their parallel composition is given by Σ<sup>1</sup> ⊗ Σ<sup>2</sup> = (E(Σ1) ∪ E(Σ2), A(Σ1) ∪ A(Σ2)).

**Definition 7.** *Let* Σ<sup>1</sup> *and* Σ<sup>2</sup> *be composable ed signatures and let* O<sup>1</sup> *and* O<sup>2</sup> *be operational ed specifications with* Σ(O1) = Σ<sup>1</sup> *and* Σ(O2) = Σ2*. The* parallel composition *of* O<sup>1</sup> *and* O<sup>2</sup> *is given by the operational ed specification* O<sup>1</sup> O<sup>2</sup> = (Σ<sup>1</sup> ⊗ Σ2, C, T,(c0, ϕ0)) *with* c<sup>0</sup> = (c0(O1), c0(O2))*,* ϕ<sup>0</sup> = ϕ0(O1) ∧ ϕ0(O2)*, and* C *and* T *are inductively defined by* c<sup>0</sup> ∈ C *and*


*– for* e ∈ E(Σ1) ∩ E(Σ2)*,* c1, c <sup>1</sup> ∈ C(O1)*, and* c2, c <sup>2</sup> ∈ C(O2)*, if* (c1, c2) ∈ C*,* (c1, ϕ1, e, ψ1, c <sup>1</sup>) ∈ T(O1)*, and* (c2, ϕ2, e, ψ2, c <sup>2</sup>) ∈ T(O2)*, then* (c 1, c <sup>2</sup>) ∈ C *and* ((c1, c2), ϕ<sup>1</sup> ∧ ϕ2, e, ψ<sup>1</sup> ∧ ψ2,(c 1, c <sup>2</sup>)) ∈ T*.* 3

## **3.3 Expressiveness of** *E↓***-Logic**

We show that the semantics of an operational ed specification O with finitely many control states can be characterised by a single E↓-sentence <sup>O</sup>, i.e., an edts <sup>M</sup> is a model of <sup>O</sup> iff <sup>M</sup> <sup>|</sup>=E<sup>↓</sup> <sup>Σ</sup>(O) <sup>O</sup>. Using Algorithm 1, such a characterising sentence is

$$\varrho\_O = \downarrow c\_0 \cdot \varphi\_0 \wedge \text{sen}(c\_0, \text{Im}\_O(c\_0), C(O), \{c\_0\})\,,$$

where c<sup>0</sup> = c0(O) and ϕ<sup>0</sup> = ϕ0(O). Algorithm 1 closely follows the procedure in [15] for characterising a finite structure by a sentence of D<sup>↓</sup>-logic. A call sen(c, I,V,B) performs a recursive breadth-first traversal through O starting from c, where I holds the unprocessed quadruples (ϕ, e, ψ, c ) of transitions outgoing from c, V the remaining states to visit, and B the set of already bound states. The function first requires the existence of each outgoing transition of I, provided its precondition holds, in the resulting formula, binding any newly reached state. Then it requires that no other transitions with source state c exist using calls to fin. Having visited all states in V , it finally requires all states in C(O) to be pairwise different.

**Algorithm 1.** Constructing a sentence from an operational ed specification **Require:** O ≡ finite operational ed specification *Im*O(c) = {(ϕ, e, ψ, c ) | (c, ϕ, e, ψ, c ) ∈ T(O)} for c ∈ C(O) *Im*O(c, e) = {(ϕ, ψ, c ) | (c, ϕ, e, ψ, c ) ∈ T(O)} for c ∈ C(O), e ∈ E(Σ(O)) <sup>1</sup> **function** sen(c, I, V, B) c: state, I: image to visit, V : states to visit, B: bound states <sup>2</sup> **if** I = ∅ **then** <sup>3</sup> (ϕ, e, ψ, c ) ← **choose** I <sup>4</sup> **if** c ∈ B **then** <sup>5</sup> **return** @c.ϕ → e ψ (c ∧ sen(c, I \ {(ϕ, e, ψ, c )},V,B)) <sup>6</sup> **else** <sup>7</sup> **return** @c.ϕ → e ψ (↓c . sen(c, I \ {(ϕ, e, ψ, c )},V,B ∪ {c })) <sup>8</sup> V ← V \ {c} <sup>9</sup> **if** V = ∅ **then** <sup>10</sup> c ← **choose** B ∩ V <sup>11</sup> **return** fin(c) ∧ sen(c , *Im*O(c ),V,B) <sup>12</sup> **return** fin(c) ∧ - <sup>c</sup>1∈C(O),c2∈C(O)\{c1} <sup>¬</sup>@c<sup>1</sup> . c<sup>2</sup> <sup>13</sup> **function** fin(c) <sup>14</sup> **return** @c . - e∈E(Σ(O)) - P ⊆*Im*O(c,e) [e- - (ϕ,ψ,c)∈<sup>P</sup> (<sup>ϕ</sup> <sup>∧</sup> <sup>ψ</sup>) ∧ ¬ (ϕ,ψ,c)∈*Im*O(c,e)\<sup>P</sup> (<sup>ϕ</sup> <sup>∧</sup> <sup>ψ</sup>) ] (ϕ,ψ,c)∈<sup>P</sup> <sup>c</sup> 

<sup>3</sup> Note that joint moves with e cannot become inconsistent due to composability of ed signatures.

It is fin(c) where this algorithm mainly deviates from [15]: To ensure that no other transitions from c exist than those specified in O, fin(c) produces the requirement that at state c, for every event e and for every subset P of the transitions outgoing from c, whenever an e-transition can be done with the combined effect of P but not adhering to any of the effects of the currently not selected transitions, the e-transition must have one of the states as its target that are target states of P. The rather complicated formulation is due to possibly overlapping preconditions where for a single event e the preconditions of two different transitions may be satisfied simultaneously. For a state c, where all outgoing transitions for the same event have disjoint preconditions, the E<sup>↓</sup>-formula returned by fin(c) is equivalent to

$$\begin{array}{c} \left( \left\| c \cdot \bigwedge\_{e \in E(\Sigma(O))} \bigwedge\_{\{\varphi, \psi, c'\} \in Im\_O(c, e)} [e \not\!] \varphi \wedge \psi] c' \wedge \\ [e \not\!] \neg \big( \bigvee\_{\{\varphi, \psi, c'\} \in Im\_O(c, e)} (\varphi \wedge \psi) \right) \big\| \text{false}. \end{array}$$

*Example 4.* We show the first few steps of representing the operational ed specification *ATM* of Fig. 1 as an E<sup>↓</sup>-sentence *ATM* . This top-level sentence is

$$\begin{array}{c} \downarrow Card. \,\text{true} \land \text{sen}(\,\text{Card}, \{ (\text{true}, \text{insert} \mathtt{Card}, \text{ch} \mathtt{k}' = \mathcal{G} \land \text{tr} \mathtt{k}' = 0, PIN) \}, \\ \{ \text{Card}, PIN, Return \}, \{ \text{Card} \}). \end{array}$$

The first call of sen(*Card*,...) explores the single outgoing transition from *Card* to *PIN* , adds *PIN* to the bound states, and hence expands to

@*Card* .true → insertCard chk = *ff* ∧ trls = 0↓*PIN* . sen(*Card*, ∅, {*Card*,*PIN* , *Return*}, {*Card*,*PIN* }).

Now all outgoing transitions from *Card* have been explored and the next call of sen(*Card*, ∅,...) removes *Card* from the set of states to be visited, resulting in

$$\begin{array}{c} \mathsf{fin}(Card) \land \mathsf{sen}(PIN, \{ (\mathsf{trls} < 2, \mathsf{enterPlN}, \dots), (\mathsf{trls} = 2, \mathsf{enterPlN}, \dots), \\\qquad (\mathsf{trls} \le 2, \mathsf{enterPlN}, \dots), (\mathsf{true}, \mathsf{cancer}, \dots) \}, \\\{PIN, Return\}, \{Card, PN\}). \end{array}$$

As there is only a single outgoing transition from *Card*, the special case of disjoint preconditions applies for the finalisation call, and fin(*Card*) results in

@*Card* . [insertCard chk = *ff* ∧ trls = 0]*PIN* ∧ [insertCard chk = *tt* ∨ trls = 0]false ∧ [enterPIN true]false ∧ [cancel true]false ∧ [ejectCardtrue]false.

## **4 Constructor Implementations**

The implementation notion defined in Sect. 3.1 is too simple for many practical applications. It requires the same signature for specification and implementation and does not support the process of constructing an implementation. Therefore, Sannella and Tarlecki [18,19] have proposed the notion of constructor implementation which is a generic notion applicable to specification formalisms which are based on signatures and semantic structures for signatures. We will reuse the ideas in the context of E↓-logic.

The notion of *constructor* is the basis: for signatures <sup>Σ</sup>1,...,Σn, Σ <sup>∈</sup> *Sig*E<sup>↓</sup> , a *constructor* κ from (Σ1,...,Σn) to Σ is a (total) function κ : *Edts*E<sup>↓</sup> (Σ1) × ... <sup>×</sup> *Edts*E<sup>↓</sup> (Σn) <sup>→</sup> *Edts*E<sup>↓</sup> (Σ). Given a constructor κ from (Σ1,...,Σn) to Σ and a set of constructors κ<sup>i</sup> from (Σ<sup>1</sup> <sup>i</sup> ,...,Σk<sup>i</sup> <sup>i</sup> ) to Σi, 1 ≤ i ≤ n, the constructor (κ1,...,κn); κ from (Σ<sup>1</sup> <sup>1</sup> ,...,Σ<sup>k</sup><sup>1</sup> <sup>1</sup> ,...,Σ<sup>1</sup> <sup>n</sup>,...,Σ<sup>k</sup><sup>n</sup> <sup>n</sup> ) to Σ is obtained by the usual composition of functions. The following definitions apply to both axiomatic and operational ed specifications since the semantics of both is given in terms of ed signatures and model classes of edts. In particular, the implementation notion allows to implement axiomatic specifications by operational specifications.

**Definition 8.** *Given specifications Sp*, *Sp*1,..., *Sp*<sup>n</sup> *and a constructor* κ *from* (Σ(*Sp*1),...,Σ(*Sp*n)) *to* Σ(*Sp*)*, the tuple Sp*1,..., *Sp*<sup>n</sup> *is a* constructor implementation via κ *of Sp, in symbols Sp* <sup>κ</sup> *Sp*1,..., *Sp*<sup>n</sup> *, if for all* M<sup>i</sup> ∈ Mod(*Sp*i) *we have* κ(M1,...,Mn) ∈ Mod(*Sp*). *The implementation involves a* decomposition *if* n > 1*.*

The notion of simple implementation in Sect. 3.1 is captured by choosing the identity. We now introduce a set of more advanced constructors in the context of ed signatures and edts. Let us first consider two central notions for constructors: signature morphisms and reducts. For data signatures A, A a *data signature morphism* σ : A → A is a function from A to A . The σ-*reduct* of an A -data state ω : A → D is given by the A-data state ω |σ : A → D defined by (ω |σ)(a) = ω (σ(a)) for every a ∈ A. If A ⊆ A , the injection of A into A is a particular data signature morphism and we denote the reduct of an A -data state ω to A by ω A. If A = A<sup>1</sup> ∪ A<sup>2</sup> is the disjoint union of A<sup>1</sup> and A<sup>2</sup> and ω<sup>i</sup> are Ai-data states for i ∈ {1, 2} then ω1+ω<sup>2</sup> denotes the unique A-data state ω with ωA<sup>i</sup> = ω<sup>i</sup> for i ∈ {1, 2}. The σ-reduct γ|σ of a configuration γ = (c, ω ) is given by (c, ω |σ), and is lifted to a set of configurations Γ by Γ |σ = {γ |σ | γ ∈ Γ }.

**Definition 9.** *An* ed signature morphism σ = (σE, σA) : Σ → Σ *is given by a function* σ<sup>E</sup> : E(Σ) → E(Σ ) *and a data signature morphism* σ<sup>A</sup> : A(Σ) → A(Σ )*. We abbreviate both* σ<sup>E</sup> *and* σ<sup>A</sup> *by* σ*.*

**Definition 10.** *Let* σ : Σ → Σ *be an ed signature morphism and* M *a* Σ *-edts. The* σ*-*reduct *of* M *is the* Σ*-edts* M |σ = (Γ, R, Γ0) *such that* Γ<sup>0</sup> = Γ0(M )|σ*, and* Γ *and* R = (Re)<sup>e</sup>∈E(Σ) *are inductively defined by* Γ<sup>0</sup> ⊆ Γ *and for all* e ∈ E(Σ)*,* γ , γ ∈ Γ(M )*: if* γ |σ ∈ Γ *and* (γ , γ) ∈ R(M )σ(e)*, then* γ|σ ∈ Γ *and* (γ |σ, γ|σ) ∈ Re*.*

**Definition 11.** *Let* σ : Σ → Σ *be an ed signature morphism. The* reduct constructor <sup>κ</sup><sup>σ</sup> *from* <sup>Σ</sup> *to* <sup>Σ</sup> *maps any* <sup>M</sup> <sup>∈</sup> *Edts*E<sup>↓</sup> (Σ ) *to its reduct* κσ(M ) = M |σ*. Whenever* σ<sup>A</sup> *and* σ<sup>E</sup> *are bijective functions,* κ<sup>σ</sup> *is a* relabelling constructor*. If* σ<sup>E</sup> *and* σ<sup>A</sup> *are injective,* κ<sup>σ</sup> *is a* restriction constructor*.*

*Example 5.* The operational specification *ATM* is a constructor implementation of *Sp*<sup>1</sup> via the restriction constructor κ<sup>ι</sup> determined by the inclusion signature morphism ι : Σ(*Sp*1) → Σ(*ATM* ), i.e., *Sp*<sup>1</sup> <sup>κ</sup><sup>ι</sup> *ATM* .

A further refinement technique for reactive systems (see, e.g., [8]), is the implementation of simple events by complex events, like their sequential composition. To formalise this as a constructor we use *composite events* Θ(E) over a given set of events E, given by the grammar θ ::= e | θ + θ | θ; θ | θ<sup>∗</sup> with e ∈ E. They are *interpreted* over an (E,A)-edts M by R(M)θ1+θ<sup>2</sup> = R(M)<sup>θ</sup><sup>1</sup> ∪R(M)<sup>θ</sup><sup>2</sup> , R(M)<sup>θ</sup>1;θ<sup>2</sup> = R(M)<sup>θ</sup><sup>1</sup> ; R(M)<sup>θ</sup><sup>2</sup> , and R(M)<sup>θ</sup><sup>∗</sup> = (R(M)θ)∗. Then we can introduce the intended constructor by means of reducts over signature morphisms mapping atomic to composite events:

**Definition 12.** *Let* Σ,Σ *be ed signatures,* D *a finite subset of* Θ(E(Σ ))*,* Δ = (D , A(Σ ))*, and* α : Σ → Δ *an ed signature morphism. The* event refinement *constructor* <sup>κ</sup><sup>α</sup> *from* <sup>Δ</sup> *to* <sup>Σ</sup> *maps any* <sup>M</sup> <sup>∈</sup> *Edts*E<sup>↓</sup> (Δ ) *to its reduct* M |α ∈ *Edts*E<sup>↓</sup> (Σ)*.*

Finally, we consider a semantic, synchronous parallel composition constructor that allows for decomposition of implementations into components which synchronise on shared events. Given two composable signatures Σ<sup>1</sup> and Σ2, the *parallel composition* γ<sup>1</sup> ⊗ γ<sup>2</sup> of two configurations γ<sup>1</sup> = (c1, ω1), γ<sup>2</sup> = (c2, ω2) with ω<sup>1</sup> ∈ Ω(A(Σ1)), ω<sup>2</sup> ∈ Ω(A(Σ2)) is given by ((c1, c2), ω<sup>1</sup> + ω2), and lifted to two sets of configurations Γ<sup>1</sup> and Γ<sup>2</sup> by Γ<sup>1</sup> ⊗ Γ<sup>2</sup> = {γ<sup>1</sup> ⊗ γ<sup>2</sup> | γ<sup>1</sup> ∈ Γ1, γ<sup>2</sup> ∈ Γ2}.

**Definition 13.** *Let* Σ1, Σ<sup>2</sup> *be composable ed signatures. The* parallel composition constructor <sup>κ</sup><sup>⊗</sup> *from* (Σ1, Σ2) *to* <sup>Σ</sup><sup>1</sup> <sup>⊗</sup> <sup>Σ</sup><sup>2</sup> *maps any* <sup>M</sup><sup>1</sup> <sup>∈</sup> *Edts*E<sup>↓</sup> (Σ1)*,* <sup>M</sup><sup>2</sup> <sup>∈</sup> *Edts*E<sup>↓</sup> (Σ2) *to* <sup>M</sup><sup>1</sup> <sup>⊗</sup> <sup>M</sup><sup>2</sup> = (Γ, R, Γ0) <sup>∈</sup> *Edts*E<sup>↓</sup> (Σ<sup>1</sup> ⊗ Σ2)*, where* Γ<sup>0</sup> = Γ0(M1)⊗Γ0(M2)*, and* Γ *and* R = (Re)E(Σ1)∪E(Σ2) *are inductively defined by* Γ<sup>0</sup> ⊆ Γ *and*


An obvious question is how the semantic parallel composition constructor is related to the syntactic parallel composition of operational ed specifications.

**Proposition 1.** *Let* O1, O<sup>2</sup> *be operational ed specifications with composable signatures. Then* Mod(O1)⊗Mod(O2) ⊆ Mod(O<sup>1</sup> O2)*, where* Mod(O1)⊗Mod(O2) *denotes* κ⊗(Mod(O1), Mod(O2))*.*

The converse Mod(O<sup>1</sup> O2) ⊆ Mod(O1) ⊗ Mod(O2) does not hold: Consider the ed signature Σ = (E,A) with E = {e}, A = ∅, and the operational ed specifications O<sup>i</sup> = (Σ,Ci, Ti,(ci,<sup>0</sup>, ϕi,<sup>0</sup>)) for i ∈ {1, 2} with C<sup>1</sup> = {c<sup>1</sup>,<sup>0</sup>}, T<sup>1</sup> = {(c<sup>1</sup>,<sup>0</sup>,true, e, false, c<sup>1</sup>,<sup>0</sup>)}, ϕ<sup>1</sup>,<sup>0</sup> = true; and C<sup>2</sup> = {c<sup>2</sup>,<sup>0</sup>}, T<sup>2</sup> = ∅, ϕ<sup>2</sup>,<sup>0</sup> = true. Then Mod(O1) = ∅, but Mod(O<sup>1</sup> O2) = {M} with M showing just the initial configuration.

The next theorem shows the usefulness of the syntactic parallel composition operator for proving implementation correctness when a (semantic) parallel composition constructor is involved. The theorem is a direct consequence of Proposition 1 and Definition 8.

**Theorem 3.** *Let Sp be an (axiomatic or operational) ed specification,* O1, O<sup>2</sup> *operational ed specifications with composable signatures, and* κ *an implementation constructor from* Σ(O1) ⊗ Σ(O2) *to* Σ(*Sp*)*: If Sp* <sup>κ</sup> O<sup>1</sup> O2*, then Sp* <sup>κ</sup>⊗;<sup>κ</sup> O1, O<sup>2</sup> *.*

*Example 6.* We finish the refinement chain for the ATM specifications by applying a decomposition into two parallel components. The operational specification *ATM* of Example 3 (and Example 5) describes the interface behaviour of an ATM interacting with a user. For a concrete realisation, however, an ATM will also interact internally with other components, like, e.g., a clearing company which supports the ATM for verifying PINs. Our last refinement step hence realises the *ATM* specification by two parallel components, represented by the operational specification *ATM* in Fig. 2a and the operational specification *CC* of a clearing company in Fig. 2b. Both communicate (via shared events) when an ATM sends a verification request, modelled by the event verifyPIN, to the clearing company. The clearing company may answer with correctPIN or wrongPIN and then the ATM continues following its specification. For the implementation construction we use the parallel composition constructor κ<sup>⊗</sup> from (Σ(*ATM* ), Σ(*CC* )) to Σ(*ATM* ) ⊗ Σ(*CC* ). The signature of *CC* consists of the events shown on the transitions in Fig. 2b. Moreover, there is one integer-valued attribute cnt counting the number of verification tasks performed. The signature of *ATM* extends Σ(*ATM* ) by the events verifyPIN, correctPIN and wrongPIN. To fit the signature and the behaviour of the parallel composition of *ATM* and *CC* to the specification *ATM* we must therefore compose κ<sup>⊗</sup> with an event refinement constructor κ<sup>α</sup> such that α(enterPIN)=(enterPIN; verifyPIN; (correctPIN+wrongPIN)); for the other events α is the identity and for the attributes the inclusion. The idea is therefore that the refinement looks like *ATM* <sup>κ</sup>⊗; <sup>κ</sup><sup>α</sup> *ATM* , *CC* . To prove this refinement relation we rely on the syntactic parallel composition *ATM CC* shown in Fig. 2c, and on Theorem 3. It is easy to see that *ATM* <sup>κ</sup><sup>α</sup> *ATM CC* . In fact, all transitions for event enterPIN in Fig. 1 are split into several transitions in Fig. 2c according to the event refinement defined by α. For instance, the loop transition from *PIN* to *PIN* with precondition trls < 2 in Fig. 1 is split into

**Fig. 2.** Operational ed specifications ATM , CC and their parallel composition

the cycle from (*PIN* ,*Idle*) via (*PINEntered*,*Idle*) and (*Verifying*, *Busy*) back to (*PIN* ,*Idle*) in Fig. 2c. Thus, we have *ATM* <sup>κ</sup><sup>α</sup> *ATM CC* and can apply Theorem 3 such that we get *ATM* <sup>κ</sup>⊗; <sup>κ</sup><sup>α</sup> *ATM* , *CC* .

## **5 Conclusions**

We have presented a novel logic, called E↓-logic, for the rigorous formal development of event-based systems incorporating changing data states. To the best of our knowledge, no other logic supports the full development process for this kind of systems ranging from abstract requirements specifications, expressible by the dynamic logic features, to the concrete specification of implementations, expressible by the hybrid part of the logic.

The temporal logic of actions (TLA [13]) supports also stepwise refinement where state transition predicates are considered as actions. In contrast to TLA we model also the events which cause data state transitions. For writing concrete specifications we have proposed an operational specification format capturing (at least parts of) similar formalisms, like Event-B [1], symbolic transition systems [17], and UML protocol state machines [16]. A significant difference to Event-B machines is that we distinguish between control and data states, the former being encoded as data in Event-B. On the other hand, Event-B supports parameters of events which could be integrated in our logic as well. An institution-based semantics of Event-B has been proposed in [7] which coincides with our semantics of operational specifications for the special case of deterministic state transition predicates. Similarly, our semantics of operational specifications coincides with the unfolding of symbolic transition systems in [17] if we instantiate our generic data domain with algebraic specifications of data types (and consider again only deterministic state transition predicates). The syntax of UML protocol state machines is about the same as the one of operational event/data specifications. As a consequence, all of the aforementioned concrete specification formalisms (and several others) would be appropriate candidates for integration into a development process based on E<sup>↓</sup>-logic.

There remain several interesting tasks for future research. First, our logic is not yet equipped with a proof system for deriving consequences of specifications. This would also support the proof of refinement steps which is currently achieved by purely semantic reasoning. A proof system for E<sup>↓</sup>-logic must cover dynamic and hybrid logic parts at the same time, like the proof system in [15], which, however, does not consider data states, and the recent calculus of [5], which extends differential dynamic logic but does not deal with events and reactions to events. Both proof systems could be appropriate candidates for incorporating the features of E<sup>↓</sup>-logic. Another issue concerns the separation of events into input and output as in I/O-automata [14]. Then also communication compatibility (see [2] for interface automata without data and [3] for interface theories with data) would become relevant when applying a parallel composition constructor.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Model-Driven Development and Model Transformation

# **Pyro: Generating Domain-Specific Collaborative Online Modeling Environments**

Philip Zweihoff(B), Stefan Naujokat, and Bernhard Steffen

Chair for Programming Systems, TU Dortmund University, Dortmund, Germany *{*philip.zweihoff,stefan.naujokat,bernhard.steffen*}*@tu-dortmund.de

**Abstract.** We present Pyro, a framework for enabling domain-specific modeling via the internet. Provided with an adequate metamodel specification, Pyro turns your browser into a collaborative, domain-specific, graphical development environment with features reminiscent of desktop IDEs for textual programming languages. The required metamodeling is supported in a high-level, simplicity-driven fashion, and the entire ready-to-run browser-based domain-specific development environment is generated fully automatically. We will illustrate the steps of this development along the realization of a graphical IDE for the Architecture Analysis and Design Language (AADL).

## **1 Introduction**

Domain-specific languages (DSLs) aim at closing the gap between domain knowledge and software development by explicitly supporting the required domain concepts. Graphical domain-specific languages have turned out to be particularly suitable for domain experts without any programming background. The bottleneck in practice is the enormous effort to develop the required domainspecific graphical modeling tools. The Cinco *SCCE Meta Tooling Suite* [26] has been designed to overcome this bottleneck by providing a holistic, simplicitydriven [22] approach for the creation of such domain-specific graphical modeling tools. A key feature of Cinco is that it generates the entire graphical modeling environment (referred to as 'Cinco Products' in the remainder of the paper) from high-level specifications of the defined model structures and functionalities. The (translational) semantics of the specified modeling language is defined in terms of code generation, model transformation, evaluation, and/or interpretation [20]. Cinco Products are Eclipse-based, graphical modeling tools that are realized via a number of Eclipse plug-ins [13]. Thus, setting up a Cinco Product involves some technical aspects that are beyond the competence of typical domain experts, and it becomes even more tedious when one wants to enable a cooperative development.

In this paper, we present *Pyro*, a tool that enables one to generate Cinco Products for collaborative modeling that run in a web browser. Conceptually, Pyro borrows from modern online editors for collaborative work, like Google

**Fig. 1.** Cinco generation architecture.

Docs, Microsoft Office 365, or solutions like ShareLaTeX/Overleaf that even free one from maintaining a corresponding build and runtime environment.

Key to the realization of Pyro is that Cinco follows a fully generative approach on the meta level, which allows one to modularly 'retarget' the Cinco Product Generation for the web (cf. Fig. 1). Technically, Pyro web modeling environments utilize *DyWA* [27] (Dynamic Web Application) for data modeling, empowering prototype-driven application development.

In order to achieve this retargeting and to enable collaborative work, Pyro needs to, in particular, compensate for all the required functionality provided by the Eclipse platform, like the EMF framework with GMF or Graphiti for graphical editors. Altogether, this poses the following three key challenges:


In the course of this tool paper, Pyro is illustrated along the development of a graphical modeling environment for the *Architecture Analysis and Design Language* (AADL), an SAE standard for modeling the architecture of embedded real-time systems [29]. Cinco was used to develop a graphical AADL modeling tool supporting a subset of AADL's features tailored to be used in teaching [28],

**Fig. 2.** Pyro web-based modeling environment for the AADL language.

where it replaces the graphical editor of the OSATE tool [8] (AADL's reference implementation). Furthermore, a dedicated code generator was developed to support verification with behavior specified with the BLESS language [17]. Another example for Pyro realizing a DSL for point and click adventures can be found in [21].

Figure 2 shows the web-based graphical AADL editor in Pyro<sup>1</sup>. We will use this editor in the remainder of this paper to illustrate Cinco's and Pyro's core ideas and concepts. The user interface is designed after commonly known concepts from integrated development environments, like Eclipse or IntelliJ. The main area in the center is covered by the *modeling canvas* showing the currently edited model. On the right, there is the *palette* showing the available types of modeling elements. They can be placed onto the canvas just by drag&drop. The attributes of the currently selected element in the editor can be set via the *properties* view at the bottom. The *validation* view (bottom right corner) constantly checks for the syntax and static semantics of the model in the canvas and provides appropriate error or warning messages. Finally, a *project explorer* and a *menu bar* complete the IDE-like appearance.

The remainder of the paper is organized as follows: While Sect. 2 briefly describes the use of Cinco's specification languages to define a sophisticated graphical

<sup>1</sup> The editor is available for experimentation on the Pyro website: https://pyro.scce.info.

modeling language, the generation to a web-based environment and the resulting architecture is explained in Sect. 3. The mechanisms and techniques used to enable simultaneous collaboration are explained in Sect. 4. The paper closes with a summary, related work, and an outlook of the future development in Sect. 5.

## **2 DSL Development with Cinco**

Cinco is a language workbench [11] for the simplicity-driven development of graphical modeling environments that are domain-specific [12], support full code generation [10,15], and easily integrate existing solutions in the form of services [23]. As Cinco is itself a meta-level application of these principles [25], it is specialized to the domain of 'graph-based graphical modeling tools' and fully generates such tools from meta-level descriptions (models) – the key enabling factor for the whole Pyro approach. Primarily relevant in this regard are two Cinco metamodeling languages:<sup>2</sup>


With these meta-level specification files, the Cinco Product Generator (which is part of Cinco) generates plug-ins for the Eclipse Rich Client Platform (RCP) that realize the editor based on the Eclipse Modeling Framework (EMF) and the Graphiti graphical editor framework. Further additions to the editor, which are not covered by these two specification files, can be injected in an aspect-oriented fashion [16]: Cinco provides a so-called mechanism of *hooks* that are triggered on the occurrence of certain events, for instance, when a node is created, moved, or deleted. Hooks are inserted into the MGL file with *annotations* on the model elements defined therein. The effect of a hook can either be modeled in a transformation language [20] or directly be written as Java code using the generated model API. In the context of the AADL editor, e.g., a postMoveHook is used to move a port to the nearest border within its container after it has been moved by the user. This results in a very natural 'snapping to the border' effect during modeling.

As Cinco follows a fully generative approach, the very same specification files are utilized by Pyro to generate a web-based modeling editor that runs in

<sup>2</sup> For a more elaborate introduction on how to define a graphical editor with Cinco, as well as other case studies and exemplary modeling languages, please refer to [26].

the browser (cf. Fig. 1). Of course, in this context, the running platform won't be based on Eclipse anymore, but based on common web frameworks like Angular for the frontend and Java EE for the backend. The aspects of a Cinco Product included in a service-oriented fashion via native components written in Java (for instance a code generator or editor-assisting features like the hooks discussed above) can thus directly be run also in the backend of the Pyro editor.

In the following, we will focus on two particularly important aspects of Pyro: After discussing the frontend/backend architecture of the generated Pyro modeling environments in Sect. 3, we will take a deeper look at the communication pattern between the involved components that facilitates synchronous collaborative modeling (cf. Sect. 4).

## **3 Architecture**

In contrast to developing an Eclipse-based modeling environment, for the realization of a web-based solution one nearly has to start from scratch. Eclipse itself is built on a huge amount of plug-ins, developed over the past seventeen years. In particular, the Eclipse Modeling Project provides many frameworks for developing modeling languages based on metamodels and bundling them into a rich IDE. In the context of the web, development of integrated environments has just started, so that only a few best practices, plug-ins, and frameworks are available. This means, even fundamental features often have to be implemented to enable basic functionalities. The main difference between local desktop IDEs and a web-based environment like Pyro is the opportunity to provide distributed access to a centralized instance by multiple users at the same time. This results in new challenges and requirements regarding the synchronization between multiple users and conflict resolution for oppositional modifications.

Thus, the Pyro architecture must be built in a way that adequately substitutes what Eclipse already provides in the desktop application context, but also be prepared for the distributed setting with multiple users – in particular for supporting live collaborative editing on the same models. In this section, the generation of Pyro web-based modeling environments is described in a way that shows how the needed information is collected from Cinco's high-level specification metamodels and where the generated code is placed and distributed in the overall context to build the Pyro architecture.

The previously introduced specification of the AADL modeling language constitutes the source for the tool generation step. After the Pyro generator is triggered, all MGL and MSL files for a Cinco-based modeling tool are collected to gather the required information. At this point, all modeling languages, including their available node and edge types, are visible for the generator.

In the next step, a template of the modeling environment web application is created. The gray parts with dotted borders in Fig. 3 show the static elements independent of the given language specification, whereas the blue parts with solid borders are specifically generated from this specification. The template consists of a *DyWA*-based backend, extended by a specific *Domain Layer*

**Fig. 3.** Overall architecture of the generated web-based modeling environment.

for communication. On the client side, some general parts provide *Registration*, *Login*, and *Project Management*, but the main component is the specific *Editor* generated to handle instances of the graphical modeling language. The underlying single-page web application framework *Angular Dart* [1] is utilized to enable the required features of a rich internet application, like versatile user interaction and asynchronous communication.

Essentially, in the backend, the challenge of providing the metamodel-based model handling (persistence, API, event handling, etc.) is solved, which in the Cinco desktop client world is provided by the EMF framework. The frontend, on the other hand, realizes the rich IDE-like frame application with the graphical editor for the models. In the following, these two parts are explained in more detail to show how the different layers are connected and which parts are generated to establish the entire integrated environment.

#### **3.1 Backend**

The backend of a modeling environment generated using Pyro consists of two main layers: One is responsible for the centralized persistence of model instances, the other for receiving and distributing modifications. The lowest level of the web application is the database to store information in a centralized fashion. This layer handles the representation of predefined metamodels for the given domain-specific languages. Pyro modeling environments utilize the *DyWA* as an abstraction layer of a database to store types and objects in a dynamic and loosely coupled fashion [27]. Based on the specified languages' node and edge types, a *Domain Data Plug-in* (see Fig. 4) is generated by Pyro which declares types, associations, attributes, and inheritance. The main reason for using the *DyWA* as model layer is its *Domain Generator*, which generates a specific *DyWA API* providing entities and controllers for the previously given types to handle their instances on a simplified layer above the database. This closely resembles the APIs generated by EMF in the Eclipse world, so that the effort of generating the required *CINCO API* adapters is greatly reduced, which provides functionalities with identical signatures as EMF, so that already

**Fig. 4.** Backend component architecture and interaction.

existing code can directly be applied (see below). Beyond that, DyWA is prepared for dynamic change of the metamodel, which becomes necessary during modeling language evolution (see [19]).

Since Cinco supports to extend the definition of graphical modeling languages by user-written Java code for hooks, actions, validation checks, and code generators, a holistic reuse mechanism has to be provided in the context of Pyro. To meet this goal, the same Cinco interfaces are rebuilt in the generated web-based modeling environment, providing the same structure and identical signatures. As a result of this, the domain-specific interfaces (see Fig. 4, *CINCO API*) generated by Pyro are compatible to the one Cinco generates for Eclipse and EMF to be used identically by these extensions. In contrast to the desktopbased Cinco Product, a Pyro graph model instance is not persisted in a file on the local system. The Pyro web modeling environment as a distributed system utilizes the DyWA database for storage and centralized access as a server. Thus, the *CINCO API* is internally connected to the corresponding generated *DyWA API* to persist changes in the database, which is hidden from the extensions.

Multi-user collaborative editing with the generated domain-specific modeling languages is one of the main challenges for Pyro. All changes to a centrally held instance of a graph model have to be shared with all participants. For the distribution of the changes performed on a graph model by calling the *CINCO API*) methods, a *Command Stack* is used, to store each individual modification. Since Cinco provides hooks for aspect-oriented extensions, a single action like the movement of a node on the canvas can result in multiple successive commands. As a result, all modifications on a model or any of their elements at runtime are encoded in commands and sequentially stored in the stack. The recorded commands during the *CINCO API* usage are used to synchronize between different clients looking at the same model as well as the realization of redo and undo functionalities. This synchronization mechanism is described in more detail in Sect. 4.

To use the web modeling environment in a desktop application fashion, an uninterruptible user interaction is necessary. Thus, Pyro utilizes REST-based asynchronous communication for non-blocking data exchange. As a result of this, the outermost component of the generated web application is a *REST Interface*. The interface consists of *Static Endpoints* for project, file, and user management, which are independent from the given modeling languages. These parts are supplemented by generated *Endpoints*, which are based on the Cinco specification and provide methods to create, read, update, and delete (CRUD) a single graph model. In addition to this, the interface contains the central endpoint for commands sent from a client's frontend to the backend. Depending on the used *Extensions*, additional *Endpoints* are generated to fetch and trigger user-written actions or a generator.

## **3.2 Frontend**

To mimic the look and feel of a local desktop modeling environment, the webbased variant generated by Pyro has to provide versatile user interactions. As a result of this, the *Frontend* of the generated web application (see Fig. 5), which realizes the interface for the user, is focused on quick responses and familiar input behavior. To achieve this goal, the frontend part of a web modeling environment is built upon the *Angular Dart* [1] framework, which is used to realize singlepage web applications with built-in cross-platform support and comprises an architecture focused on reusable components. In addition to this, it is tailored to asynchronous user interaction and client-side routing, so that it can be used to build rich internet applications, like, for instance, ones resembling integrated development environments (IDEs).

In contrast to a local desktop application, a web application requires additional multi-user focused interfaces. Therefore, the template for the frontend, which is initially created, consists of static user interfaces for *Registration* and *Login* as well as a *Project Management* area to create, edit, and share projects. The specifically generated parts are used by the *Editor*, which comprises domainspecific components. Its user interface is similar to the known Eclipse IDE used by regular Cinco Products (see Fig. 2).

The challenge of preventing delays in the system's response on a user input to enable fluent interaction can be met by avoiding synchronized communication with the backend. The *editor* facilitates this frontend-side computation by two layers used to interact with instances of the graph models. The *Mirror Layer* stores a snapshot of the model present in the database, whereas the *Interaction Layer* is a direct representation of a visible graph which can be modified by the user. This separation enables a delta between the last valid graph, stored in the *Mirror Layer* and the currently visible graph. Thanks to this, generated syntactical validators (e.g., for ensuring lower bounds of given cardinalities) can

**Fig. 5.** Front end architecture.

raise errors and the appropriate rollback operation works immediately on the client side without additional communication with the backend.

Pyro specifically aims at supporting users switching from already existing Cinco Products to the web-based modeling environment. Thus, the *Editor*, which is the main part of the frontend, provides multiple components similar to the Eclipse IDE. To not confuse users, functions, behavior and arrangement are recreated. Besides common user interface parts like a project explorer and a menu, specific components for the modeling environment are generated, like the *Canvas*, a *Properties View*, and the *Palette*.

The *Canvas* is based on the *JointJS* framework [9], which in general renders SVGs and adds versatile user interaction for manipulation of nodes and edges via drag&drop functionalities. Using this, it was possible that the web modeling environment running in a browser provides very similar handling to the Eclipsebased desktop application with its Graphiti editor. The exact replication of the node and edge appearance is a central goal of the generated *Canvas*. Ideally, a user cannot distinguish between a Pyro and Cinco visualization of a graph model. This requires the same hierarchical shape structure for the web as in the Graphiti editor, which can be realized by scalable vector graphics (SVGs). The *SVG Markup*, which defines the shapes and styling information of the nodes and edges, is generated based on the concrete syntax specified in the MSL files of Cinco. The *JointJS* framework and *SVG Markup* files are observed by a domain-specific *User Event Controller*, which realizes the listeners and stream handling mechanisms for a single graph model to modify the underlying layers.

Besides the distinct and visible modifications available directly in the *Canvas*, attributes of an edge, node or the graph model (as defined in the MGL metamodel) can be modified using the *Properties View*. It has a generic frame based on a tree view to recursively walk through associated types of the currently selected element. For every type present in an MGL file, a form for editing the primitive attributes (e.g string, Boolean or integer) is generated. The single fields are tailored to the specified data type of the attribute, to give as much support as possible. Thanks to the two-way data binding of the underlying Angular framework, every change to an attribute is immediately propagated to the underlying layer.

The *Palette* is generated based on the given MGL specifications. It lists all node types available for modeling. In addition to this, the optional annotations of the MGL, e.g. for grouping nodes and dedicated icons for visual support, are considered as well.

## **4 Collaborative Editing**

One of the main features of modeling environments generated by Pyro is the simultaneous editing of graph models by multiple clients at the same time. The continuous synchronization between clients avoids classical revision control repositories for distributed access and instead enables immediate collaboration. To reach the goal of simultaneous synchronization, different aspects have been considered to maintain consistency, scalability and achieve a real-time effect.

In this section, the mechanism used for Pyro web-based modeling environments to communicate is presented and explained. The first part discusses the different challenges of a distributed system with respect to the domain of graphical modeling environments, whereas the second part describes the realization of the command pattern used to exchange modifications on a graph model.

## **4.1 Simultaneous Synchronization Mechanism**

The main communication concept of a generated modeling environment by Pyro as a distributed system is the *optimistic replication strategy* [30]. This concept replicates data and allows the single replicas to diverge, which in the context of Pyro is realized by the separated graph model replicas held in each client. The optimistic replication belongs to the *eventually consistent* consistency model and is furthermore classified as *basically available, soft state and eventually consistent* (BASE) [36]. It benefits from high availability, since it only exchanges updates on given items. In the context of a web-based modeling environment, the updates are based on the modifications a client can do to a node or edge. To enable conflict resolution and maintain consistency regarding commutativity and idempotency, *conflict-free replicated data types* (CRDTs) are represented by commands. CRDT was originally used for text-based synchronization as a simplification of *operational transformation* [34]. It utilizes an additional data structure, based on an identifier of the client, the changed value and the position to create a unique identifier for each changed character of the text. Regarding the graph models handled by Pyro, CRDTs are realized by commands for each type of possible model element modification, which store a unique identifier and the changed properties of the relevant element. In addition to this, the previous values of the updated properties are stored as well, to enable rollback, undo, and redo functionalities. Thus, Pyro uses operation-based and state-based CRDTs. Thanks to the CRDTs, conflicts of simultaneously editing the same model element at the same time can be detected. In the context of graphical DSLs, conflicts can arise by violating the given static semantics defined in the metamodel. If a conflict is detected, the corresponding command is flagged for rollback and returned to its sender. The client then inverts the modification encoded by the command and applies it to revert the conflicting change.

#### **4.2 Distributed Command Pattern**

The distribution of modifications made to a graph model in the Pyro web modeling environment is realized by a *command pattern* [14]. It belongs to the behavioral design pattern, which is used to encapsulate all information needed to perform an update on an object. The commands are sent as HTTP POST requests, combining the graph model and client identifier. An exemplified collaboration of two clients (red and green) modifying the same graph model simultaneously is presented in Fig. 6.

After the initial read from the database, a client only calculates, exchanges and receives commands when a modification is done (see Fig. 6(1)). For every possible change on nodes and edges (e.g., moving a node or bending an edge), a dedicated command encoding the modification is created and sent to the server, extended with a unique identifier of the sender. Thanks to this assignment, all commands can be differentiated (see red commands by client A and green commands by client B in Fig. 6). As an example, the command for the creation of a node consists of the node type, the position and an identifier of the container where it should be instantiated. Other commands, e.g., the move node commands, contain information of the previous as well as the new position, so that they store the delta of the modification.

The *Serializer* (see Fig. 6(2)) is used to parse the received payload and assign the commands to the associated *Command Applier*. Thanks to additional reflective *type* annotations, the received payload can be parsed to recreate the correct command type. The assignment depends on the given graph model type the command belongs to.

The *Command Applier* (see Fig. 6(3)) is the main component of the web server, since it receives, validates and executes the commands. Every modification encoded by a command is initially validated against the syntactical constraints defined by the graph model type. In the case of a constraint violation, the command is inverted based on the given delta, and returned to undo the invalid operation sent from a client. After a successful validation, the modification encoded by the command is applied to the generated domain-specific API, which also triggers the annotated hooks and finally modify the node or edge instances in the central database. Modifications performed on the API itself (e.g., performed by a hook implementation) are again internally encoded as commands for further distribution to other clients. The updates resulting from the hook execution inside the API are combined with the initial command to be

**Fig. 6.** Concept of the distributed command pattern. (Color figure online)

interpreted as a single transaction shown by the packages of Fig. 6. To ensure the consistency between the sender of a command and the other clients, the initiator is also informed about internally arisen modification based on hook execution. All commands, collected during the execution of the initial modification, are broadcast to other listening clients (see Fig. 6(4)). This mechanism uses bidirectional ongoing connections, so that clients can request to listen on changes made to their currently open graph model.

The commands received by a client (see Fig. 6(5)) are parsed and inspected, to ensure that commands initiated by the client itself are neglected. New changes from other clients are applied to all layers and displayed on the canvas. In addition to this, the client is notified about received changes. Updates caused as a result of self-sent commands (e.g., a modification performed during a hook execution), are only partially applied to guarantee that nodes and edges will not be modified twice.

The command pattern applied to the generated modeling environments is tailored to enable real-time collaborative editing. The main design decisions are focused on scalability and high availability by BASE and CRDT. The operational approach realized with this command pattern is more suitable than a textual language protocol like the *Language Server Protocol* (LSP) [3]. The main difference between the command pattern and the LSP is the way of distributing modifications on the model. In contrast to the presented communication protocol of Pyro, the LSP uses changed regions of a text document for propagation. The intention of the modification has to be evaluated afterwards, whereas in graphical DSLs the commands are used for a direct representation of the occurred change.

## **5 Conclusion and Perspectives**

We have presented Pyro, a framework for enabling domain-specific modeling via the internet. Provided with an adequate metamodel specification, Pyro turns a browser into a collaborative, domain-specific, graphical development environment with features reminiscent of desktop IDEs for programming textual languages. The required metamodeling is supported in a high-level, simplicitydriven fashion: The MGL describes the available node types, edge types, and syntactical constraints, whereas the MSL defines the visual appearance of the modeling artifacts defined in the MGL. Based on these specifications, the entire ready-to-run browser-based domain-specific development environment is generated fully automatically, as has been illustrated along the construction of a graphical development environment for the Architecture Analysis and Design Language (AADL).

The field of web-based development environments is still quite young, so that not many related solutions exist yet. There are the aforementioned collaborative online text editors like Google Docs, Microsoft Office 365 and ShareLa-TeX/Overleaf, but in the area of DSLs and modeling, so far we only encountered WebGME [5], an (early stage) online adaption of Vanderbilt University's Generic Modeling Environment [18] and Theia [4], a cross-platform web and desktop IDE for textual DSLs. In addition, itemis (the German company who significantly contributed to the well-known Xtext [6] DSL framework) is currently working on a platform called 'Convecton', which aims at bringing modeling with and execution of domain-specific languages online into the cloud [35]. However, none of these solutions provide a Pyro-like, graphical, collaborative modeling support.

Pyro is still in an early stage of development, and there is a lot of room for improvement, like further enhancing and easing the graphical modeling features, or improving the performance of collaborative modeling by taking advantage of peer-to-peer communication. Pyro is envisioned to enable cross-competence collaboration on a single project in a domain/purpose-specific fashion according to the Language-Driven Engineering (LDE) paradigm [31]. LDE aims at allowing the different stakeholders to formulate their intents in they way they are used to, i.e., in their domain language, and restricted in a fashion that the efforts of the other involved stakeholders are maintained, or as we say, constitute Archimedean points [32] of the considered domain-specific language. Currently, we are starting to explore the impact of the Pyro technology on a larger scale for DIME [7], our framework for developing Web applications.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Efficient Model Synchronization by Automatically Constructed Repair Processes**

Lars Fritsche1(B) , Jens Kosiol<sup>2</sup> , Andy Sch¨urr<sup>1</sup> , and Gabriele Taentzer<sup>2</sup>

<sup>1</sup> TU Darmstadt, Darmstadt, Germany *{*lars.fritsche,andy.schuerr*}*@es.tu-darmstadt.de <sup>2</sup> Philipps-Universit¨at Marburg, Marburg, Germany *{*kosiolje,taentzer*}*@mathematik.uni-marburg.de

**Abstract.** Model synchronization, i.e., the task of restoring consistency between two interrelated models after a model change, is a challenging task. Triple Graph Grammars (TGGs) specify model consistency by means of rules. They can be used to automatically derive specifications of edit operations for single models and repair rules that propagate model changes to related models. model (re-)synchronization activities more effectively, a construction mechanism for *short-cut* rules has been recently developed. They describe consistency-preserving complex edit operations across model boundaries. We show that edit and repair rules can be derived from *short-cut* rules. As proof of concept, we implemented the construction and application of *short-cut* edit and repair rules in eMoflon. Our evaluation shows that *short-cut*-rule-based repair processes have considerably decreased data loss and improved runtime compared to former model synchronization processes in eMoflon.

**Keywords:** Model synchronization · Triple Graph Grammars · Short-cut rule

## **1 Introduction**

Model-driven engineering has become an important technique to cope with the increasing complexity of modern software systems. In the field of Concurrent Engineering [7], for example, products are no longer realized in series but allow parallel tasks. Each of these tasks has its view onto the product and, as a view evolves, it may become inconsistent with the other ones. Keeping views synchronized by checking and preserving their consistency can be a challenging problem which is not only subject to ongoing research but also of practical interest for industrial applications such as stated above.

Triple Graph Grammars (TGGs) [24] are a declarative, rule-based bidirectional transformation approach that aims to synchronize models stemming from different views (usually called *domains* in the TGG literature). Their purpose is to define a consistency relationship between pairs of models in a rule-based manner by defining traces between their elements. Given a finite set of rules that define how both models co-evolve, a TGG can be automatically *operationalized* into *source* and *forward rules*. The source rules of an operationalized TGG can be used to build up models of one domain while forward rules translate them to models of the other domain, thereby establishing traces between their elements. From a synchronization point of view, source rules specify edit operations to change one model while forward rules specify repair operations to synchronize model changes with one another [16,19,24]. Even though both, the translation and the synchronization process, are formally defined and sound, there are in fact several practical issues that arise for model synchronization from (potentially transitive) dependencies between rule applications: To synchronize changed models, popular TGG approaches do not always fix inconsistencies locally but revert all dependent rule applications and start a retranslation process. However, this kind of synchronization often deletes and recreates a lot of model elements to reestablish model consistency, potentially losing information that is local to just one model and wasting processing time. Existing solutions for this problem are rather ad hoc and come without any guarantee to reestablish the consistency of modified models [12,14].

As a new solution to this synchronization problem, we derive *repair rules* from *short-cut* rules [8] that we recently introduced to handle complex consistencypreserving model updates more effectively and efficiently. The construction of *short-cut* rules is a kind of sequential rule composition that allows to replace a rule application with another one while preserving involved model elements (instead of deleting and re-creating them). We used *short-cut* rules to describe model changes exchanging one edit step by another one. Since in this paper we want to use *short-cut* rules for model synchronization as well, they have to be operationalized into *source* and *forward* rules.

Our formal contributions (in Sect. 4) are two-fold: As *short-cut* rules may be non-monotonic, i.e., may be deleting, we formalize the operationalization of non-monotonic TGG rules which decomposes short-cut rules into (semantically equivalent sequences of) source (edit) and forward (repair) rules. Moreover, we obtain sufficient conditions under which an application of a *short-cut* rule preserves the consistency of related pairs of models. This was left to future work in [8]. Together, this constitutes the correctness of our approach using operationalized *short-cut* rules for model synchronization.

Practically, we implement our synchronization approach in eMoflon [21], a state-of-the-art bidirectional graph transformation tool, and evaluate it (Sect. 5). The results show that the construction of *short-cut* repair rules enables us to react to model changes in a less invasive way by preserving information and increasing the performance. We thus contribute to a more comprehensive research trend in the bx-community towards *Least Change* synchronization [5]. Before presenting these results in detail, we illustrate our approach using an example in (Sect. 2) and recall some preliminaries in (Sect. 3). Finally, we discuss related work in (Sect. 6) and conclude with pointers to future work in (Sect. 7). A technical report that includes additional preliminaries, all proofs, and the rule set used for our evaluation (including more complex examples) is available online [9].

## **2 Introductory Example**

We motivate the use of *short-cut* repair processes by synchronizing a Java AST (abstract syntax tree) model and a custom documentation model. For model synchronization, we consider a Java AST model as *source* model and its documentation model as *target* model, i.e., changes in a Java AST model have to be transferred to its documentation model. There are correspondence links in between such that both models become correlated.

**Fig. 1.** Example: TGG rules (Color figure online)

**Fig. 2.** Example: TGG forward rules

*TGG rules.* Figure 1 shows the rule set of our running example consisting of three TGG rules: *Root-Rule* creates a root *Package* together with a root *Folder* and a correspondence link in between. This rule has an empty precondition and only creates elements which are depicted in green and with the annotation (++). *Sub-Rule* creates a *Package* and *Folder* hierarchy given that an already correlated *Package* and *Folder* pair exists. Finally, *Leaf-Rule* creates a *Class* and a *Doc-File* under the same precondition as *Sub-Rule*.

These rules can be used to generate consistent triple graphs in a synchronized way consisting of source, correspondence, and target graph. A more general scenario of model synchronization is, however, to restore the consistency of a triple graph that has been altered on just one side. For this purpose, each TGG rule has to be operationalized to two kinds of rules: *source* rules enable changes of source models which is followed by translating this model to the target domain with *forward* rules. As *source* rules for single models are just projections of TGG rules to one domain, we do not show them explicitly.

*Forward translation rules.* Figure 2 depicts the *forward* rules. Using these rules, we can translate the Java AST model depicted on the source side of the triple graph in Fig. 3(a) to a documentation model such that the result is the complete graph in Fig. 3(a). To obtain this result we apply *Root-FWD-Rule* at the root *Package*, *Sub-FWD-Rule* at *Packages* p and subP, and finally *Leaf-FWD-Rule* at *Class* c. To guide the translation process, context elements that have already been translated are annotated with in *forward* rules. A formerly created source element gets the marking → to indicate that applying the rule will mark this element as translated; a formalization of this marking is given in [20]. Note that *Root-FWD-Rule* can always be applied when *Sub-FWD-Rule* is applicable which can lead to untranslated edges. For simplicity, we assume that the correct rule is applied which in praxis can be achieved through negative application conditions [15].

**Fig. 3.** Exemplary synchronization scenario

*Model synchronization.* Given the triple graph in Fig. 3(a), a user might want to change a sub *Package* such as p to be a root *Package*, e.g., as could be the case when the project is split up into multiple projects. Since p was created and translated as a sub *Package* rather than a root element, this change introduces an inconsistency. To resolve this issue, one approach is to revert the translation of p into f and re-translate p with an appropriate translation rule such as *Root-FWD-Rule*. Reverting the former translation step may lead to further inconsistencies as we remove elements that were needed as context elements by other rule applications. The result is a reversion of all translation steps except for the first one which translated the original root element. The result is shown in Fig. 3(b). Now, we can re-translate the unmarked elements yielding the result graph in (c). This example shows that this synchronization approach may delete and re-create a lot of similar structures which appears to be inefficient. Second, it may lose information that exists on the target side only, e.g., a use case may be assigned to a document which does not have a representation in the corresponding Java project.

*Model synchronization with short-cut repair.* In [8] we introduced short-cut rules as a kind of rule composition mechanism that allows to replace a rule application by another one while preserving elements (instead of deleting and re-creating them). In our example, *Root-Rule* and *Sub-Rule* overlap in elements as the first rule can be completely embedded into the latter one. Figure 4 depicts two possible short-cut rules based on *Root-Rule* and *Sub-Rule*. While the upper short-cut

**Fig. 4.** Short-cut rules (Color figure online)

**Fig. 5.** Repair rules

rule replaces *Root-Rule* with *Sub-Rule*, the lower short-cut rule replaces *Sub-Rule* with *Root-Rule*. Both short-cut rules preserve the model elements on both sides and solely create elements that do not yet exist (++), or delete those depicted in red and annotated with (−−). They are constructed by overlapping both original rules such that each created element that can be mapped to the other rule becomes context and as such, is not touched. When a created element cannot be mapped because it only appears in the replacing rule, it is created. Consequently, an element is deleted if the created element only appears in the replaced rule. Finally, context elements occurring in both rules appear also in the short-cut rule while overlapped context elements appear only once. Using *Sub-To-Root-SC-Rule* enables the user to transform the triple graph in Fig. 3(a) directly to the one in (c).

Yet, these rules can still not cope with the change of a single model since short-cut rules transform both models at once as TGG rules usually do. Hence, in order to be able to handle the deleted edge between rootP and p, we have to forward operationalize short-cut rules, thereby obtaining *short-cut repair* rules. Figure 5 depicts the resulting *short-cut repair* rules derived from *short-cut* rules in Fig. 4. A non-monotonic TGG-rule is forward operationalized by removing deleted elements from the rule's source graphs as they should not be present after a source rule application. *Short-cut repair* rules allow to propagate source graph changes directly to target graphs to restore consistency. In our example, after having transformed Package p into a root element, the rule of choice is *Sub-To-Root-Repair-Rule* which transforms Folder f in Fig. 3(a) into a root element and deletes the superfluous *Doc-File*. The result is again the consistent triple graph depicted in Fig. 3(c). This repair allows to skip the costly reversion process with the intermediate result in Fig. 3(b). Note that applying *Sub-To-Root-Repair-Rule* at arbitrary matches may have undesired consequences: One could, e.g., delete the edge between two *Folders* even if the matched *Packages* are still connected. Our Theorem 8 characterizes matches where such violations of the language of the grammar cannot happen. In our implementation, we exploit an incremental pattern matcher to identify valid matches. Using suitable *negative application conditions* [6] would be an alternative approach.

## **3 Preliminaries**

To understand our formal contributions, we assume familiarity with the basics of double-pushout rewriting in graph transformation and, more generally in adhesive categories [6,18] as well as the definition of TGGs and in particular, their operationalizations [24]. Here, we recall non-basic preliminaries for our work which are the construction of short-cut rules, the notion of sequential independence, and a (simple) categorical definition of partial maps.

In [8], we introduced short-cut rules as a new way of sequential composition for monotonic rules. Given an inverse rule of a monotonic rule (i.e., a rule that only deletes) and a monotonic rule, a short-cut rule combines their respective actions into a single rule. Its construction allows to identify elements that are deleted by the first rule as re-created by the second one. These elements are preserved in the resulting short-cut rule. A *common kernel*, i.e., a common subrule of both, serves to identify how the two rules overlap and which elements are preserved instead of being deleted and re-created. We recall their construction since our construction of repair rules is based on it. Examples are depicted in Fig. 4.

**Definition 1 (Short-cut rule).** *In an adhesive category C, given two monotonic rules* r<sup>i</sup> : L<sup>i</sup> -→ Ri, i = 1, 2*, and a common kernel rule* k : L<sup>∩</sup> -→ R<sup>∩</sup> *for them, the* Short-cut rule r−<sup>1</sup> <sup>1</sup> <sup>k</sup> <sup>r</sup><sup>2</sup> := (<sup>L</sup> <sup>l</sup> ←− K <sup>r</sup> -−→ R) *is computed by executing the following steps depicted in Figs. 6 and 7:*


**Fig. 6.** Construction of LHS and RHS of short-cut rule *r−*<sup>1</sup> <sup>1</sup> *<sup>k</sup> r*<sup>2</sup>

**Fig. 7.** Construction of interface *<sup>K</sup>* of *r−*<sup>1</sup> <sup>1</sup> *<sup>k</sup> r*<sup>2</sup>

Sequential independence of two rule applications intuitively means that none of these applications enables the other one. This implies that the order of their application may be switched. The definition of sequential independence can be extended to a sequence of rule applications longer than 2. In Theorem 8, we will use this to identify language-preserving applications of short-cut rules.

**Definition 2 (Sequential independence).** *Given two rules* p<sup>i</sup> = (L<sup>i</sup> l*i* ←− K<sup>i</sup> r*i* -−→ Ri) *with* i = 1, 2*, two direct transformations* G ⇒p1,m<sup>1</sup> H<sup>1</sup> *and* H<sup>1</sup> ⇒<sup>p</sup>2,m<sup>2</sup> H<sup>2</sup> *via the rules* r<sup>1</sup> *and* r<sup>2</sup> *are* sequentially independent *if there exist two morphisms* d<sup>1</sup> : R<sup>1</sup> → D<sup>2</sup> *and* d<sup>2</sup> : L<sup>2</sup> → D<sup>1</sup> *as depicted below such that* n<sup>1</sup> = f<sup>2</sup> ◦ d<sup>1</sup> *and* m<sup>2</sup> = f<sup>1</sup> ◦ d2*.*

$$\mathop{L\_{m\_{1}}}\limits\_{G} \xleftarrow{l\_{1}} \underbrace{K\_{1}}\_{\longleftarrow} \mathop{K\_{1}}\underbrace{\mathop{\longleftrightarrow}^{r\_{1}} R\_{1}}\_{\longleftarrow} \mathop{\mathbf{R}\_{1}}\_{\longleftarrow} \mathop{\mathbf{L}\_{2}}\_{\longleftarrow} \mathop{\mathbf{L}\_{2}}\_{\longleftarrow} \mathop{K\_{2}}\underbrace{\mathop{\longleftrightarrow}^{r\_{2}} R\_{2}}\_{\longleftarrow} \mathop{R\_{2}}$$

*Given rules* p = (L ←K-→ R) *and* p<sup>i</sup> = (L<sup>i</sup> ← K<sup>i</sup> -→ Ri) *with* 1 ≤ i ≤ t*, a transformation* G<sup>t</sup> ⇒p,m H *is* sequentially independent from a sequence of transformations G<sup>0</sup> ⇒<sup>p</sup>1,m<sup>1</sup> G<sup>1</sup> ⇒<sup>p</sup>2,m<sup>2</sup> ···⇒<sup>p</sup>*t*,m*<sup>t</sup>* Gt, t ≥ 2 *if first,* G<sup>t</sup> ⇒p,m H *and* G<sup>t</sup>−<sup>1</sup> ⇒<sup>p</sup>*t*,m*<sup>t</sup>* G<sup>t</sup> *are sequentially independent and then, the arising transformations* G<sup>t</sup>−<sup>1</sup> ⇒p,e*t*◦d*<sup>t</sup>* <sup>2</sup> G <sup>t</sup> *and* G<sup>t</sup>−<sup>2</sup> ⇒<sup>p</sup>*t*−1,m*t*−<sup>1</sup> G<sup>t</sup>−<sup>1</sup> *are sequentially independent and so forth back to the transformations* G<sup>0</sup> ⇒<sup>p</sup>1,m<sup>1</sup> G<sup>1</sup> *and* G<sup>1</sup> ⇒p,e2◦d<sup>2</sup> <sup>2</sup> G 2 *(where* e<sup>i</sup> : D<sup>i</sup> -<sup>→</sup> <sup>G</sup><sup>i</sup>−<sup>1</sup> *is given by the transformation and* <sup>d</sup><sup>i</sup> <sup>2</sup> : L -→ D<sup>i</sup> *exists by sequential independence as in the figure above).*

To formalize the application of non-monotonic TGG rules, we need to consider triple graphs with partial morphisms from correspondence to source (or target) graphs. For expressing such triple graphs categorically, we recall a simple definition of partial morphisms [23] to be used in Sect. 4.1. An elaborated theory of triple graphs with partial morphisms is out of scope of this paper.

**Definition 3 (Partial morphism. Commuting square with partial morphisms).** *A* partial morphism a *from an object* A *to an object* B *is a(n equivalence class of ) span(s)* <sup>A</sup> <sup>ι</sup>*<sup>A</sup>* ←− A <sup>a</sup> −→ B *where* ι<sup>A</sup> *is a monomorphism (denoted by* -→*). A partial morphism is denoted as* a : A B*;* A *is called the* domain *of* a*. A diagram with two partial morphisms* a *and* c *as depicted as square* (1) *in Fig. 8 is said to be* commuting *if there exists a (necessarily unique) morphism* x : A → C *such that both arising squares* (2) *and* (3) *in Fig. 9 commute.*

$$\begin{array}{c} A \dashv \dashv \dashv \dashv \dashv B \\\\ f \Big\downarrow \\\\ C \dashv \dashv \dashv \dashv \dashv D \end{array}$$

**Fig. 8.** Square of partial morphisms **Fig. 9.** Commuting square of partial morphisms

## **4 Constructing Language-Preserving Repair Rules**

The general idea of this paper is to use *short-cut repair* rules allowing an optimized model synchronization process based on TGGs. To this end, we operationalize short-cut rules being constructed from the rules of a given TGG. Since those rules are not necessarily monotonic, we generalize the well-known operationalization of TGG rules to the non-monotonic case and show that the basic property is still valid: An application of a source rule followed by an application of the corresponding forward rule is equivalent to applying the original rule instead. This is the content of Sect. 4.1. Constructing *shortspscut* rules in [8], we identified the following problem: Applying a short-cut rule derived from rules of a given grammar might lead to an instance that is not part of the language defined by that grammar. Therefore, in Sect. 4.2, we provide sufficient conditions for applications of short-cut rules leading to instances of the grammar-defined language only. Combining both results ensures the correctness of our approach, i.e., a *shortspscut* repair rule actually propagates a model change from the source to the target model if it is correctly matched.

#### **4.1 Operationalization of Generalized TGG Rules**

Since the operationalization of TGG rules has been introduced for monotonic rules only, we extend the theory to general triple rules and, moreover, allow for partial morphisms from correspondence to source and target graph in triple graphs. We split a rule on triple graphs into a *source rule* that only affects the source part and a *forward rule* that affects correspondence and target part.

**Definition 4 (TGG rule).** *Let the category of triple graphs and graph morphisms be given. A triple rule* p *is a span of triple graph morphisms*

$$p = \langle (L\_S \xleftarrow{\sigma\_L} L\_C \xrightarrow{\tau\_L} L\_T) \xleftarrow{\tau\_L} L\_T \rangle \xleftarrow{\langle l\_S, l\_C, l\_T\rangle} (K\_S \xleftarrow{\sigma\_K} K\_C \xrightarrow{\tau\_K} K\_T) \xleftarrow{\langle r\_S, r\_C, r\_T\rangle} (R\_S \xleftarrow{\sigma\_R} R\_C \xrightarrow{\tau\_R} R\_T) \langle (R\_S \xleftarrow{\sigma\_R} R\_C \xrightarrow{\tau\_R} R\_T) \xleftarrow{\sigma\_L} R\_T \xleftarrow{\sigma\_R} R\_C \xrightarrow{\tau\_R} R\_T$$

*which, wherever possible, are abbreviated by*

$$p = (L\_{SCT} \xleftarrow{(l\_S, l\_C, l\_T)} K\_{SCT} \xleftarrow{(r\_S, r\_C, r\_T)} R\_{SCT})\dots$$

*Rules* p<sup>S</sup> *and* p<sup>F</sup> *are called* source rule *and* forward rule *of* p*.*

$$p\_S = ((L\_S \leftarrow \emptyset \to \emptyset) \xleftarrow{(l\_S, id\_{\emptyset}, id\_{\emptyset})} (K\_S \leftarrow \emptyset \to \emptyset) \xleftarrow{(r\_S, id\_{\emptyset}, id\_{\emptyset})} (R\_S \leftarrow \emptyset \to \emptyset)),$$

$$p\_F = (R\_S L\_{CT} \xleftarrow{(id\_{R\_S}, l\_C, l\_T)} R\_S K\_{CT} \xrightarrow{(id\_{R\_S}, r\_C, r\_T)} R\_{SCT}))$$

*with* ∅ *being the empty graph. In* RSLCT = (R<sup>S</sup> L<sup>C</sup> τ*L* -−→ L<sup>T</sup> )*, the morphism from* L<sup>C</sup> *to* R<sup>S</sup> *may be partial and is defined by the span* (L<sup>C</sup> l*C* ←− K<sup>C</sup> r*<sup>S</sup>* ◦σ*<sup>K</sup>* -−−−−→ RS) *with* σ<sup>K</sup> : K<sup>C</sup> -−→ R<sup>C</sup> *.* Target *and* backward rules p<sup>T</sup> *and* p<sup>B</sup> *are defined symmetrically in the other direction.*

*Given a TGG, a* short-cut repair rule *is a forward rule* p<sup>F</sup> *of a short-cut rule* p = r−<sup>1</sup> <sup>1</sup> <sup>k</sup> r<sup>2</sup> *where* r1, r<sup>2</sup> *are (monotonic) rules of the TGG, i.e., a repair rule is an operationalized short-cut rule.*

The above definition is motivated by our application scenario, i.e., the case where a user edits the source (or target) model independently of the other parts. The partial morphism in the forward rule reflects that a model change may introduce a situation where the result is no longer a triple graph. A deleted source element may have a preimage in the correspondence graph that is not deleted as well. In the example *short-cut* rules in Fig. 4, this problem does not occur since edges are deleted only. But in general, this definition of p<sup>S</sup> has the disadvantage that often, p<sup>S</sup> is not applicable to any triple graph since the result would not be one.

In practical applications, however, the source rule specifies a user edit action that is performed on the source part only, ignoring correspondence and target graphs. The fact that the result is not a triple graph any longer is not a technical problem. A missing source element that should be referenced by a correspondence element gives information about a location that needs some repair. Therefore, we define the application of a source rule such that the resulting triple graph is allowed to be partial. Furthermore, forward rules may be applied to partial triple graphs allowing for dangling correspondence relations.

**Definition 5 (Constructing an operationalized rule application).** *Let a triple graph rule* p = (LSCT (l*S* ,l*C* ,l*T* ) ←−−−−−− KSCT (r*S* ,r*C* ,r*T* ) −−−−−−−→ RSCT ) *with source rule* p<sup>S</sup> *and forward rule* p<sup>F</sup> *be given. An operationalized rule application* G ⇒<sup>p</sup>*<sup>S</sup>* ,m*<sup>S</sup>* G ⇒<sup>p</sup>*<sup>F</sup>* ,m*<sup>F</sup>* H *is constructed as follows:*


HS*, called* source application *and inducing the span* G<sup>S</sup> f*S* ←− D<sup>S</sup> g*S* -−→ HS*.*


*and if there are morphisms* σ <sup>D</sup> : D<sup>C</sup> → H<sup>S</sup> *and* τ<sup>D</sup> : D<sup>C</sup> → D<sup>T</sup> *such that* HSDCD<sup>T</sup> -→ HSG<sup>C</sup> G<sup>T</sup> *and* RSKCK<sup>T</sup> -→ HSDCD<sup>T</sup> *are triple morphisms.*

**Fig. 10.** Retrieval of partial morphism *<sup>G</sup><sup>C</sup> <sup>H</sup><sup>S</sup>*

In the setting of this paper, it is enough to allow for partial morphisms only in the input graph and not in the output graph of a forward rule application. Intuitively this means that such an application deletes those elements from the correspondence graph that could not be mapped to elements in the source graph any longer and additionally deletes the preimages in the correspondence graph of all deleted elements from the target graph as well (if there are any). The next lemma states that the application of a source rule is well-defined, i.e., that the mentioned partial morphism actually exists.

**Lemma 6 (Correctness of application of source rules).** *Let a (nonmonotonic) triple graph rule*

$$p = (L\_{SCT} \xleftarrow{(l\_S, l\_C, l\_T)} K\_{SCT} \xrightarrow{(r\_S, r\_C, r\_T)} R\_{SCT}))$$

*with source rule* p<sup>S</sup> *and projection* ppr <sup>S</sup> *to the source part be given. Given a match* m<sup>S</sup> *for* p<sup>S</sup> *to a triple graph* G = (G<sup>S</sup> ← σ*G* −− G<sup>C</sup> <sup>τ</sup>*<sup>G</sup>*−−→ <sup>G</sup><sup>T</sup> ) *such that* <sup>G</sup><sup>S</sup> <sup>⇒</sup>ppr *<sup>S</sup>* ,m*<sup>S</sup>* HS*, the partial morphism* D<sup>C</sup> H<sup>S</sup> *as described in Definition 5 exists.*

The next theorem states that a sequential application of a source and a forward rule indeed coincides with an application of the original rule as long as the matches are consistent. This means that the forward rule has to match the RHS R<sup>S</sup> of the source rule again and the LHS L<sup>C</sup> of the correspondence rule needs to be matched in such a way that all elements not belonging to the domain of the partial morphism from correspondence to source part in the input model are deleted. The forward rule application defined in Definition 5 fulfills this condition by construction.

**Theorem 7 (Synthesis of rule applications).** *Let a triple graph rule* p *with source and forward rules* p<sup>S</sup> *and* p<sup>F</sup> *be given. If there are applications* G ⇒<sup>p</sup>*<sup>S</sup>* ,m*<sup>S</sup>* G *with co-match* n<sup>S</sup> *and* G ⇒<sup>p</sup>*<sup>F</sup>* ,m*<sup>F</sup>* H *with* m<sup>F</sup> = (nS, m<sup>C</sup> , m<sup>T</sup> ) *as constructed above, then there is an application* G ⇒p,m H *with* m = (mS, m<sup>C</sup> , m<sup>T</sup> )*.*

#### **4.2 Language-Preserving Short-Cut Rules**

In this section we identify sufficient conditions for an application of a short-cut rule that guarantee the result to be an element of the language of the original grammar. Since our conditions apply to arbitrary adhesive categories and are not specific for TGGs, we present the result in its general form.

**Theorem 8 (Characterization of valid applications).** *In an adhesive category* C*, given a sequence of transformations*

$$G \Rightarrow\_{r,m} G\_0 \Rightarrow\_{p\_1, m\_1} G\_1 \Rightarrow\_{p\_2, m\_2} \dots \Rightarrow\_{p\_t, m\_t} G\_t \Rightarrow\_{r^{-1} \ltimes\_k r', m\_{sc}} H\_k$$

*with rules* p1,...,p<sup>t</sup> *and* r−<sup>1</sup> <sup>k</sup> r *being the short-cut rule of monotonic rules* r : L -→ R *and* r : L -→ R *along a common kernel* k*, there is a match* m *for* r *in* G *and a transformation sequence*

$$G \Rightarrow\_{r',m'} G\_1' \Rightarrow\_{p\_1, m\_1'} \dots \ G\_{t-1}' \Rightarrow\_{p\_t, m\_t'} H,$$

*provided that*


*In particular, given a grammar* GG = (R, S) *such that* r, r , p1,...,p<sup>t</sup> ∈ R *and* G ∈ L(GG)*, then* H ∈ L(GG)*.*

Independence of the short-cut rule application tsc : G<sup>t</sup> ⇒<sup>r</sup>−1*<sup>k</sup>*r,m*sc* H from the preceding transformation sequence t : G ⇒ G<sup>t</sup> requires the existence of morphisms in two directions: morphisms d<sup>i</sup> <sup>2</sup> from the LHS of the short-cut rule to the context objects D<sup>i</sup> arising in t and morphisms d<sup>i</sup> <sup>1</sup> from the right-hand sides R<sup>i</sup> of the rules p<sup>i</sup> to the context object of tsc (shifted further and further to the beginning of the sequence). In the case of (typed triple) graphs, the existence of morphisms d<sup>i</sup> <sup>2</sup> ensures that none of the rule applications in t enabled the transformation tsc. The existence of morphisms d<sup>i</sup> <sup>1</sup> ensures that the transformation tsc does not delete structure needed to perform the transformation sequence t.

*Application to model synchronization.* The results in Theorems 7 and 8 are the formal basis for an automatic construction of repair rules. Theorem 7 ensures that a suitable edit action followed by application of a repair rule at the right match is equivalent to the application of a short-cut rule. Thus, whenever an edit action on the source model (or symmetrically the target model) corresponds to the source-action (target-action) of a short-cut rule, application of the corresponding forward (backward) rule synchronizes the model again. Since the language of a TGG is defined by its rules, every valid model can be reached from every other valid model by inverse application of some of the rules of the grammar followed by normal application of some rules. Often, edit actions are rather small steps (or at least consist of those). Thus, it is not unreasonable to expect that many typical edit actions can be realized as short-cut rules as these formalize the inverse application of a rule followed by application of a normal one. Theorem 8 characterizes the matches for short-cut rules at which application stays in the language of the TGG. For operational short-cut rules, this can either be used for detecting invalid edit actions or determining valid matches for synchronizing forward rules.

## **5 Implementation and Evaluation**

*Implementation.* Our implementation<sup>1</sup> of an optimized model synchronizer is based on the existing EMF-based general purpose graph and model transformation tool eMoflon [21]. It offers support for rule-based unidirectional and bidirectional graph transformations where the latter is based on TGGs. To support an effective model synchronizer, we automatically calculate a small but useful subset of all possible short-cut rules. This is done by overlapping as many created elements as possible and only varying in the way that context elements are mapped onto each other. These selected short-cut rules are operationalized to get repair rules that allow us to repair broken links similar to our example in Sect. 2. The model synchronization process is based on an *incremental graph pattern matcher* that tracks all matches that dis-/appear due to model changes. Thus, it offers the ability to react to model changes without the need to recompute matches from scratch. Our implementation uses this technique by processing all those matches marked as broken by the pattern matcher after a model change. A broken match is the starting point to find a repair match as it is defined by the co-match of the performed model change and has to be extended. If the pattern matcher can extend a broken match to a repair match, the corresponding *short-cut* repair rule can be applied. Otherwise, we fall back to the old synchronization strategy of revoking the current step. This completely automatized synchronization process ensures that we are able to restore consistency as long as the edited domain model still resides in the language of our TGG.

*Evaluation.* Our experimental setup consists of 23 TGG rules (shown in our technical report [9]) that specify consistency between Java AST and custom documentation models and 37 short-cut rules derived from our TGG rule set. A small modified excerpt of this rule set was given in Sect. 2. For this evaluation, however, we define consistency not only between *Package* and *Folder* hierarchies but also between type definitions, e.g., *Classes* and *Interfaces*, and *Methods* with their corresponding documentation entries. We extracted five models from Java projects hosted on Github using the tool MoDisco [4] and translated them into our own documentation structure. Also, we generated five synthetic models consisting of n-level *Package* hierarchies with each non-leaf*Package* containing five sub-*Packages* and each leaf *Package* containing five *Classes*. Given such Java

<sup>1</sup> Both the implementation and evaluation workspace can be accessed via https:// github.com/Arikae00/FASE19 eMoflon-evaluation.

models, we refactored each model in three different scenarios such as by moving a *Class* from one *Package* to another or completely relocating a *Package*. Then we used eMoflon to synchronize these changes in order to restore consistency to the documentation model, with and without *repair rules*.

These synchronization steps are subject to our evaluation and we pose the following research questions: **(RQ1)** *For different kinds of changes, how many elements can be preserved that would otherwise be deleted and recreated?* **(RQ2)** *How does our new approach affect the runtime performance?* **(RQ3)** *Are there specific scenarios in which our approach performs especially good or bad?*

*Repair rules* were developed to avoid unnecessary deletions of elements by reverting too many rule applications in order to restore consistency as shown exemplary in Sect. 2. This means that model changes where our approach should perform especially good, have to target rule applications close to the beginning of a rule sequence as this possibly renders many rule applications invalid. This means that altering a root *Package* by creating a new *Package* as root would imply that many rule applications have to be reverted to synchronize the changes correctly (Scenario 1). In contrast, our approach might perform poorly when a model change does not inflict a large cascade of invalid rule applications. Hence, we move *Classes* between *Packages* to measure if the effort of applying *repair rules* does infer a performance loss when both the new and old algorithm do not have to repair many broken rule applications (Scenario 2). Finally, we simulate a scenario between the first two by relocating leaf *Packages* (Scenario 3).


**Table 1.** Legacy vs. new synchronizer – Time in sec. and number of created elements

Table 1 depicts the measured times (Sec) and the number of created elements (Elts) in each scenario. Each created element also represents a deleted element, e.g., through revoking and reapplying a rule or applying a repair rule that creates and deletes elements. In more detail, the table shows measurements for the initial translation of the MoDisco model into the documentation structure and synchronization steps for each scenario using the legacy synchronizer without *repair rules* and the new synchronizer with *repair rules*.

W.r.t. our research questions stated above, we interpret this table as follows: The right columns of the table show clearly that using repair rules preserves all those elements in our scenarios that would otherwise be deleted and recreated by the legacy algorithm<sup>2</sup> **(RQ1)**. The runtime shows a significant performance gain for Scenario 1 including a worst-case model change **(RQ2)**. *Repair rules* do not introduce an overhead compared to the legacy algorithm as can be seen for the synthetic time measurements in Scenario 3 where only one rule application has to be repaired or reapplied. **(RQ2)**. Our new approach excels when the cascade of invalidated rule applications is long. Even if this is not the case, it does not introduce any measurable overhead compared to the legacy algorithm as shown in Scenarios 2 and 3 (**RQ3**).

*Threats to validity.* Our evaluation is based on five real world and five synthetic models. Of course, there exists a wide range of projects that differ significantly from each other due to their size, purpose, and developer styles. Thus, the results may probably differ for other projects. Nonetheless, we argue that the four larger projects extracted from Github are representative since they are part of established tools from the Eclipse community. In this evaluation, we selected three edit operations that are representative w.r.t. their dependency on other edit operations. They may not be representative w.r.t. other aspects such as size or kind of change, which seems to be of minor importance in this context. Also we limited our evaluation to one TGG rule set due to space issues. However, in our experience the approach shows similar results for a broader range of TGGs which can be accessed through eMoflon.

## **6 Related Work**

*Reuse in existing work on TGGs.* Several approaches to model synchronization based on TGGs suffer from the fact that the revocation of a certain rule application triggers the revocation of all dependent rule applications as well [12,16,19]. Especially from a practical point of view such cascades of deletions shall be avoided: In [10], Giese and Hildebrandt propose rules that save nodes instead of deleting and then re-creating them. Their examples can be realized by our construction of *repair rules*. But they do not present a general construction or proof of correctness. This is left as future work in [11] again, where other aspects of [10] are formalized and proven to be correct.

In [3], Blouin et al. added a specially designed repair rule to the rules of their case study to avoid information loss. Greenyer et al. [14] also propose to not directly delete elements but to mark them for deletion and allow for reuse of these marked elements in other rule applications. But this approach comes without any formalization or proof of correctness as well. Again, the given example can be realized as short-cut repair. These uncontrolled and informal approaches are

<sup>2</sup> Scenario 1: We expect the new root element to already be translated.

potentially harmful. Re-using elements wrongly may lead to, e.g., containment cycles or unconnected data. Hence, providing precise and sufficient conditions for correct re-use of data is highly desirable as re-use may improve scalability and decrease data-loss. Our short-cut rules formalize when data can be correctly reused. In summary, we do not only offer a unifying principle behind different practically used improvements of TGGs but also give a precise formalization that allows for automatic construction of the rules needed. Thereby, we present conditions under which rule applications lead to valid outputs.

*Comparison to other bx approaches.* Anjorin et al. [2] compared three state-ofthe-art bx tools, namely eMoflon [21] (rule-based), mediniQVT [1] (constraintbased) and BiGUL [17] (bx programming language) w.r.t. model synchronization. They point out that synchronization with eMoflon is faster than with both other tools as the runtime of these tools correlates with the overall model size while the runtime of eMoflon correlates with the size of the changes done by edit operations. Furthermore, eMoflon was the only tool able to solve all but one synchronization scenario. One scenario was not solved because it deleted more model elements than absolutely necessary in that case. Using short-cut repair rules, we can solve the remaining scenario and moreover, can further increase eMoflons model synchronization performance.

*Change-preserving model repair.* Change-preserving model repair as presented in [22,25] is closely related to our approach. Assuming a set of consistencypreserving rules and a set of edit rules to be given, each edit rule is accompanied by one or more repair rules completing the edit step, if possible. Such a complement rule is considered as repair rule of an edit rule w.r.t. an overarching consistency-preserving rule. Operationalized TGG rules fit into that approach but provide more structure: As graphs and rules are structured in triples, a source rule is also an edit rule being complemented by a forward rule. In contrast to that approach, source and forward rules can be automatically deduced from a given TGG rule. By our use of short-cut rules we introduce a pre-processing step to first enlarge the sets of consistency-preserving rules and edit rules.

*Generalization of correspondence relation.* Golas et al. provide a formalization of TGGs in [13] which allows to generalize correspondence relations between source and target graphs as well. They use special typings for the source, target, and correspondence parts of a TGG and for edges between a correspondence part and source and target part instead of using graph morphisms. That approach also allows for partial correspondence relations. But it makes the deletion of elements more complex as it becomes important how many incident edges a node has (at least in the double-pushout approach). We therefore opted for introducing triple graphs with partial morphisms. They allow us to just delete a node without caring if it is needed within an existing correspondence relation.

## **7 Conclusion**

Model synchronization, i.e., the task of restoring consistency between two models after a model change, poses challenges to modern bx approaches and tools: We expect them to synchronize changes without losing data in the process, thus, preserving information and furthermore, we expect them to show a reasonable performance. While Triple Graph Grammars (TGGs) provide the means to perform model synchronization tasks in general, both requirements cannot always be fulfilled since basic TGG rules do not define the adequate means to support intermediate model editing. Therefore, we propose additional edit operations being short-cut rules, a special form of generalized TGG rules that allow to take back one edit action and to perform an alternative one. In our evaluation, we show that operationalized short-cut rules allow for a model synchronization with considerably decreased data loss and improved runtime.

To better cope with practical application scenarios, we like to extend our approach by formally incorporating type inheritance, application conditions and attributes in the model synchronization process. Since all of these have been formalized in the setting of (M-)adhesive categories and our present work uses that framework as well, these extensions are prepared but up to future work. Propagating changes from one domain to another is basically done here by operationalizing short-cut rules. A more challenging task is what we call model integration where related pairs of models are edited concurrently and have to be synchronized. These model edits may be in conflict across model boundaries. It is up to future work to allow short-cut rules in model integration. Our hope is to decrease data loss and to improve runtime of model integration tasks as well.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Offline Delta-Driven Model Transformation with Dependency Injection**

Artur Boronat(B)

Department of Informatics, University of Leicester, Leicester, UK aboronat@le.ac.uk

**Abstract.** When model transformations are used to implement consistency relations between very large models (VLMs), incrementality plays a cornerstone role in the realization of practical consistency maintainers. State-of-the-art model transformation engines with support for incrementality normally rely on a publish-subscribe model for linking model updates − deltas − to the application of model transformation rules, in so called dependencies, at run time. These deltas can then be propagated along an already executed model transformation. A small number of such engines use domain-specific languages (DSLs) for representing model deltas offline in order to enable their use in asynchronous, eventbased execution environments.

The principal contribution of this work is the design of a forward delta propagation mechanism for incremental execution of model transformations, which decouples dependency tracking from delta propagation using two innovations. First, the publish-subscribe model is replaced with dependency injection, physically decoupling domain models from consistency maintainers. Second, a standardized representation of model deltas is reused, facilitating interoperability with EMF-compliant tools, both for defining deltas and for processing them asynchronously. This procedure has been implemented in a model transformation engine, whose performance has been evaluated empirically using the VIATRA CPS benchmark. In the experiments performed, the new transformation engine shows gains in the form of several orders of magnitude in the initial phase of the incremental execution of the benchmark model transformation and delta propagation is performed in real time, independently of the size of the models involved, whereas the up-to-now best-performant approach is dependent.

**Keywords:** Mappings between languages · Traceability · Incremental model transformation · Performance benchmark

## **1 Introduction**

Significant issues in the application of Model-Driven Engineering (MDE) in large-scale industrial problems stem from interoperability and scalability of current MDE tools [1,16,17]. Model transformation, widely accepted as the *heart and soul* of MDE [23], deals with model manipulation either by translating models or by synchronizing them. Current tool support for model transformation is a key root cause for many of the bottlenecks hampering scalability in MDE [2,8]. This is particularly crucial when transformations are used to implement consistency maintainers between very large models (VLMs), consisting of milions of elements. In this context, incrementality ensures that only those parts of the model that are inconsistent or that have been modified − a model delta − are transformed or, more precisely, propagated along an already executed transformation [11,12].

Current state-of-the-art approaches that support incremental execution of model transformations share common features: the delta propagation mechanism is usually decoupled from the delta detection mechanism in order to facilitate maintainability of the consistency maintainer; and deltas are represented either in memory for synchronous notification or offline, with dedicated domainspecific languages, for asynchronous notification. The most mature tools rely on a publish/subscribe mechanism, where model deltas are notified at run time whenever a model is updated. This notification mechanism is synchronous and loosely couples model updates with the delta propagation mechanism, facilitating maintainability of the underlying transformation engine after fixing the type of notification. However, it usually requires an observer for each object that can be modified, with a consequent impact on performance, and the model transformation must be live, in memory, in order to listen for changes. These problems can be avoided by using offline deltas. The publish/subscribe mechanism can be extended to enable asynchronous delta notification but this is normally achieved by using dedicated domain-specific languages to represent deltas offline, which do not involve standardized formats, hindering the interoperability of those transformation engines in existing modeling tool ecosystems.

In this paper, the design of a forward delta propagation procedure is presented for executing model transformations in incremental mode that can handle documented change scenarios [4], i.e. documents representing a change to a given source model. Such documents are defined with the EMF change model [24], both conceptually and implementation-wise, guaranteeing interoperability with EMF-compliant tools. This design decision replaces a publish/subscribe notification with dependency injection: each notification is directly performed by the implementation of the domain model at run time by injecting the dependency corresponding to the model update that has been performed. Aspect-oriented programming is used to weave code into an already existing implementation of a domain model totally decoupling domain models from the consistency maintainer at design time. The proposed forward delta propagation procedure has been implemented in YAMTL [6], a model transformation engine for VLMs, enabling the execution of model transformations both in batch mode and in incremental mode without additional user specification overhead. This new extension dramatically improves the performance of the batch execution mode when dealing with sparse model deltas, which can be propagated in real time (i.e. in μs.).

This work is structured as follows: Sect. 2 provides a self-contained description of the class of model transformations supported using a class diagram to relational schema model transformation; Sect. 3 presents the forward propagation procedure implemented in the model transformation engine together with the main innovations; Sect. 4 discusses the performance of the transformation engine with an adaptation of the VIATRA CPS benchmark; Sect. 5 discusses related work from reactive and bidirectional model transformation.

## **2 Model Transformation: A Running Example**

The type of model transformations that are considered in this work are classified as unidirectional and out-place. For example, when considering the well-known example that maps class diagrams to relational schemas, a class diagram is used by queries to extract information and a relational schema is built from scratch. If we consider a graph transformation perspective, both models are considered to form part of the same graph in order to enable transformation by rewriting. In that case, we are only considering transformations where the two models are two clearly disjoint subgraphs and where rewriting is performed deterministically.

In this work, model transformations are represented using an implementationagnostic graphical syntax, quite close to that used in the graph transformation literature. In this representation, metamodels are given as class diagrams, the abstract syntax of models is given as object diagrams and model transformations are represented as a collection of rules, where each rule is defined as a pair of model patterns, called left-hand side (LHS) and right-hand side (RHS). The notion of metamodel, model and model pattern correspond to those of type graph, attributed graph with containments and node inheritance, and graph pattern in the graph transformation literature [5,10]. For example, the rules A->C and R->FK of Fig. 1 map attributes to columns. The \$ before a variable denotes string interpolation.

Graph patterns in rules can be augmented with universally quantified variables (represented by an overlaid box). Moreover, rules are augmented with a when clause to express conditions that must be satisfied by the variables in LHS, and with a where clause to indicate how variables from LHS and from RHS are related via the application of other rules, expressed as two graph patterns. Formulas in a when clause may be expressed in conjunctive form, as all filter conditions must be satisfied in order for the rule to be applied, whereas formulas in a where clause may be expressed in disjunctive form (assuming mutually exclusive conditions), as all the side effects expressed in a where clause must be evaluated. The variables of RHS of the main rule must appear either in the LHS of the main rule or in the RHS of a where transformation step. The rule C->T of Fig. 1 illustrates how to map a class to a table with a primary key column PK COL and for each attribute A whose type is a DataType, the corresponding column is obtained by applying a rule, with the rule A->C, and for each attribute OTHER whose type is the class C, matched in LHS of the main rule, a new foreign key column is added to the table T, with the rule R->FK.

**Fig. 1.** Metamodels, example and transformation rules.

From an operational point of view, transformation rules are applied unidirectionally from LHS to RHS performing an out-place transformation following two steps. First, during the *matching phase*, matches for the rules in the model transformation are found as long as they are not shared by different rules and these are included in a set *matchPool*. A match is formally defined as a graph morphism from LHS to the source graph, which satisfies the when conditions, but it is represented as a map from variables to object identifiers for the sake of presentation in this paper.

Second, during the *execution phase*, each match is processed by triggering the application of a transformation rule, which is represented as a transformation step, denoted by <sup>r</sup> : −−−−→ *in* → <sup>ς</sup> <sup>→</sup> −−−−−→ *out* → ς, which consists of a labelled pair of two matches, the match for the input pattern of the rule, which enables its application, and the match for the output pattern of the rule, with the objects that result from applying the rule. When a rule is applied, the source model is only used for query purposes but the target model is constructed by adding the pattern of the RHS instantiated with values from the variables both in the LHS and in the RHS of where transformation steps. In addition, where transformation steps may further expand the structure of the target model. This execution model resembles the application of forward rules used in triple graph grammars (TGGs) [22], where the source graph is annotated as rules are applied and only the target graph is constructed together with a link in a correspondence graph, where each link denotes a transformation step.

## **3 Delta-Driven Model Transformations**

This section presents the mechanism to propagate documented deltas δ*<sup>t</sup>* from a source model M*<sup>s</sup>* to a target model M*<sup>t</sup>* in an incremental way, when the (unidirectional) synchronization correspondence between these two models is represented with a model transformation t as described in the previous section. This has been implemented in the YAMTL transformation engine [6], which has been extended with two modes of execution: *initialization*, the transformation is executed in batch mode but, additionally, tracks those parts of the source model involved in transformation steps as *dependencies*; *propagation*, the transformation is executed incrementally for a given source delta.

In order for a model transformation to be executed in propagation mode, it first needs to be executed in initialization mode in order both to create transformation steps and to inject the dependencies that facilitate the analysis of the impact of changes in the already executed model transformation. Therefore, the transformation t is applied to M*<sup>s</sup>* using the original batch semantics [6] while injecting dependencies in the transformation engine. Once the initialization is done, any number of source forward deltas δ*<sup>s</sup>* can be propagated.

Given a source documented delta δ*<sup>s</sup>* between a source model M*s*, already synchronized with a target model M*<sup>t</sup>* via a model transformation t : M*<sup>s</sup>* ∗ −→ M*<sup>t</sup>* (where <sup>∗</sup> −→ denotes a sequence of transformation steps), and an updated source model M *<sup>s</sup>*, the transformation engine propagates the model update δ*<sup>s</sup>* along t. The effect of this forward propagation is the application of an update δ*<sup>t</sup>* on the target model M*t*.

In the following subsections, we explain the different phases of the new execution modes, initialization and propagation, in more detail. As the initialization mode faithfully corresponds to the batch execution of a model transformation, the discussion of this mode focuses on the type of dependencies that are injected in the transformation engine in Sect. 3.1. The discussion on the propagation mode focuses on how deltas are represented in Sect. 3.2. Then, the two main phases of the propagation execution mode, namely impact analysis and delta propagation, are explained in Sects. 3.3 and 3.4, respectively.

#### **3.1 Dependency Injection**

When running a model transformation in initialization mode, the engine monitors the source model and whenever an object ς is matched or a feature call, represented as a pair (ς,f) of an EMF object ς and a feature name f, is performed, a dependency is injected into the dependency registry. A dependency thereby links either an object ς or a feature call (ς,f) to transformation steps r : −−−−→ *in* → <sup>ς</sup> <sup>→</sup> −−−−−→ *out* → <sup>ς</sup> in which it is used. Such dependencies are detected both during the matching phase and during the execution phase.

In the matching phase, while finding a match for a rule, the engine keeps track of all of the feature calls used in both element and rule when conditions. When a match is found to be valid, the collection of dependencies is injected into the dependency registry for the transformation step that uses that match. Otherwise,


**Table 1.** Analysis of dependencies for the initial MT *<sup>t</sup>* : *<sup>M</sup><sup>s</sup>* ∗ −→ *M<sup>t</sup>* of Fig. 2.

when the match is not valid, the collected dependencies are discarded. Additionally, when inserting a match in the *matchPool*, the transformation engine also records reverse matches as injected dependencies between matched objects ς and the transformation step in which they are matched.

Dependencies may also be found when executing a transformation step, e.g., while executing initialization expressions associated with attributes in model patterns in RHS and in where clauses. In such cases, the transformation engine injects a dependency for the transformation step every time a feature call in the source model is detected. As a result, note that several transformation steps may depend on the same object ς, when rules have more than one single input element, or on the same feature call (ς,f).

Table 1 shows the dependencies that are found when executing the transformation of Fig. 1 in initialization mode from model M*s*. Each row in the table represents a transformation step, where: the source match indicates where the rule has been applied, the target match indicates what objects were created, and dependencies refers to the set of feature calls associated with a transformation step. Reverse matches are extracted from source matches, by reading them in the opposite direction.

Dependency injection is configured with an aspect whose pointcut matches feature calls under a user-defined namespace. Hence, the model transformation engine is entirely decoupled from the domain model at design time. They become tightly coupled at compilation time and, hence, at run time.

#### **3.2 Representable Deltas**

The EMF change model [24] is used to represent deltas to an instance of any other EMF model. It is built-in in EMF and, therefore, available for any EMFcompliant tool. In this section, we describe how a documented delta is represented with the EMF change model and how it can be automatically defined given any potentially *live* atomic update.

A delta consists of a ChangeDescription which contains a map of objectChanges, which refer to those objects that are updated and, for each such object, it contains a list of FeatureChanges. A FeatureChange (FC) refers to the structural feature that needs to be updated and provides the new value. For single-valued attributes, a FeatureChange contains the new dataValue if the feature is an attribute. For references and multi-valued attributes, a FeatureChange includes a containment reference listChanges pointing to ListChange. ListChanges are used to represent addition to, removal from, or movement *within* the given feature values. In particular, movement only captures when an object changes to a different index within the collection. However, it does not capture structural changes, e.g. change of container, which are represented as a removal from and an addition to the corresponding containment references. When a FeatureChange refers to a containment reference, objects to be added are pointed by objectsToAttach and objects to be removed are pointed by objectsToDetach.

FeatureChanges capture when a feature value is updated for an object but EMF also permits adding and removing root objects to a resource, representing the model in memory, which need not be contained by any other object. Such changes are considered to be performed on the resource itself and are represented with ResourceChanges, one for each changed resource. A ResourceChange (RC) contains the ListChanges for the root objects of the corresponding resource, similarly to multi-valued features. For a more detailed explanation of the EMF change model, we refer the reader to [24].

Table 2 shows a classification of atomic model updates that are representable with the EMF change model as explained above. Note that moving and object structurally, case 12 − *move (inter.)*, − is represented in a composite delta by two opposite actions, removing the object either from the root contents of the resource − if it is a root object (case 2) − or from a containment reference − if it is a contained object (case 10) − and adding it either to the root contents of the resource − if it is to become a root object (case 1) − or to another containment reference in another container object (case 9). This case is not captured by the EMF change model explicitly but the transformation engine is able to infer it, as explained in the following section.


**Table 2.** Summary of model update types, with their representation in EMF.

A delta, which may represent atomic and composite changes, is defined as an instance of the EMF change model and can be serialized. EMF also provides facilities for applying them and reversing them. Furthermore, EMF provides a change recorder, which enables recording *live updates* as a ChangeDescription for either a root object, a collection of root objects, a resource or a resource set. The resulting ChangeDescription is the representation of a *history scenario* [4], from the updated model to the original one, which is optimized. That is, atomic changes for the same feature of the same object may be discarded or merged, as long as the optimization process preserves reversibility. Hence, reversing the recorded delta may yield less changes than were originally made. Reversed deltas represent *documented scenarios* and can be propagated along a model transformation, as discussed in subsequent sections.

**Fig. 2.** Source/target metamodels, initial synchronized models and forward delta propagation (a–e).

The EMF change recorder enables the possibility of deferring the observation of updates to the point in which they occur, saving memory resources, and interoperability. Furthermore, recorded (history) deltas can be regarded as a rollback mechanism for implementing transactional model updates, which may be performed live.

Figure 2 shows examples of documented deltas, defined over the source model M*<sup>s</sup>* of the running example. Such deltas are representable as EMF model changes, i.e. operationally, but are graphically depicted using the abstract syntax of M*s*, using their state-based representation for the sake of presentation. Additions and updates, including moves, are highlighted in grey colour. Objects that are added, and thus created, have a new identifier. Objects that are updated and/or moved preserve their identifier. Removals are highlighted by using dashed lines for the contour lines of the corresponding shapes. The given deltas are instantiations of case 4 (delta a), changing the name of the class Order to Invoice; case 1 (delta b), adding a root class Product; case 9 (delta c), adding a single-valued attribute amount to class Item; case 10 (delta d), removing the attribute date from class Item; and case 11 (delta e), structurally moving the attribute date from class Item to class Order.

In the following subsections, the different phases of the procedure for forward propagation of source deltas is discussed and the aforementioned examples will be used for illustrating them.

#### **3.3 Impact Analysis**

In this subsection, we discuss how source documented deltas are analyzed in order to determine which transformation steps are affected by source changes. This analysis is comprised of three main steps: identification of atomic model updates from a documented delta, initialization of locations for newly enabled rules, and marking of transformation steps impacted by changes.

*Identification of atomic model updates.* In the first step, the transformation engine infers which objects and which feature calls have been impacted by changes. For objects, it also infers whether an object has been added or removed, ignoring if the object is moved, either within the same collection or structurally.

For affected objects, such information is recorded in the set *DO* of *dirty objects* of the form (ς, *ctype*), where ς is the affected object and *ctype* is the type of change from the set { ADD, DEL}. To obtain a dirty object from the delta, FeatureChanges and ResourceChanges are traversed considering two cases: when an object ς is added either to a containment feature (for a FeatureChange) or to the root contents of the resource (for a ResourceChange) and such object is not removed elsewhere in the delta, either from a containment reference or from the root contents of the resource; and, similarly, when an object is deleted and it is not added elsewhere in the delta. *DO* is augmented with (ς, ADD) in the first case and with (ς, DEL) in the second case.

For affected feature calls, such information is recorded in the set *DFC* of *dirty feature calls* of the form (ς,f), where ς is an object and f is a feature


**Table 3.** Impact analysis of source deltas a–e.

name. For each FeatureChange of an ObjectChange, the dirty feature call (ς,f) with the object ς referred by the ObjectChange and the feature name f referred to by the FeatureChange is added to *DFC* .

Table 2 shows how atomic model update types are represented using the EMF change model (column *delta representation*), internally, using the sets *DO* and *DFC* . Table 3 shows the sets *DO* of dirty objects and *DFC* of dirty feature calls for the source deltas of Fig. 2. Note that the sets *DO* and *DFC* decouple the transformation engine from the EMF change model and provide another entry point for defining deltas programmatically, which can be used for capturing atomic *live changes* received via EMF adapters.

*Initialization of delta locations.* For each dirty object (ς, ADD), the object ς is added to the extent associated with *type*(o) in the location map used for delta propagation. This potentially enables new matches when rules are matched during the delta propagation phase.

*Marking of impacted transformation steps.* In this step, transformation steps that are affected by the atomic changes in the source delta are marked as dirty. For each dirty object (ς, ADD) <sup>∈</sup> *DO*, the extent of type type(ς) is augmented with ς. This will potentially enable new matches for some rule during the change propagation phase. For each dirty object (ς, DEL) <sup>∈</sup> *DO*, we obtain the list of transformation steps that are affected from the map of reverse matches. Such transformation steps will then remain transient and the objects in their target match will not be linked to other objects in the target models. In particular, note that when processing root objects or a containment reference, an object that is removed in the delta is not present in the updated source model and, therefore, it does not trigger the transformation step that had been executed in the initial transformation.

For each dirty feature call (ς,f) ∈ *DFC* we obtain the list of transformation steps that are affected from the registry of dependencies. For each such transformation step, the satisfaction of its source match is checked. If such source match is still valid, then it is inserted into *matchPool*Δ, the pool of matches that are used to schedule rule applications during the change propagation phase.

For each atomic change in Fig. 2, Table 3 shows the marking of transformation steps that are (re-)scheduled according to the dependencies of Table 1. In particular, if a transformation step is re-scheduled, its current source and target matches are included, it is marked as dirty and included in *matchPool*Δ. If a transformation step is not to be re-executed, it is simply marked as dirty. New transformation steps, with fresh matches due to new objects, are scheduled in *matchPool*Δ. This last step is actually achieved by augmenting the corresponding type extent with the new objects and the matches are scheduled during the change propagation phase, explained in the next subsection.

## **3.4 Change Propagation**

After the impact analysis phase, delta propagation proceeds by executing a model transformation using the matching and execution phases, as outlined in Sect. 2. Figure 2 illustrates the propagation of source deltas according to the model transformation of Fig. 1. We highlight how incrementality has been considered in these two phases below.

*Matching Phase.* During the matching phase (in batch/initialization execution mode), matches for a given rule are found by traversing objects from the extent of the types associated with the elements of the source pattern of the rule, with the constraints specified in the form of graphical patterns and when conditions. In propagation mode, the transformation engine employs the same pattern matching algorithm but it fetches objects from the location map used for delta propagation, initialized during the change impact analysis phase. Therefore, new matches may be found for objects that have been created by the source delta. Those matches are inserted both into *matchPool* and *matchPool*Δ, scheduling new transformation steps. Table 3 shows that two new transformation steps are scheduled, one for rule C->T in delta b, and one for rule A->C in delta c.

*Execution Phase.* During the execution phase, transformation steps determined by the matches in *matchPool*<sup>Δ</sup> are executed. Such matches originate from the impact analysis phase, corresponding to transformation steps that are *dirty* and need to be re-executed, and from the matching phase above, corresponding to new transformation steps.

The re-execution of a transformation step is performed as in the batch/initialization mode but for the creation of transformation steps. Whereas a newly scheduled transformation step needs to get its output objects initialized (instantiated for output elements), a dirty transformation step *reuses* the objects of the target match and unsets their features. This avoids loss of contextual information, which is not affected by changes, when re-executing a transformation step. In particular, those references to output objects that emerge from the external context are preserved. On the other hand, references from those output objects are re-calculated by re-executing the transformation step. It is worth noting that the transformation engine uses where clauses to define references to objects that are created by other rules, which in turn uses a cache mechanism to avoid re-executing the transformation step that produced it. Therefore, when a dirty transformation rule is re-executed, the initialization of output element bindings are performed again. However, those bindings that are initialized in a where clause are also initialized incrementally. That is, only those objects that belong to a match of a new scheduled transformation step will be transformed from scratch. References to already initialized objects will be simply fetched. Hence, the granularity of the target delta is as fine grained (at binding level) as the source delta for the underlying graph structure of the model.

## **4 Performance Analysis**

For the empirical analysis of the incremental execution of model transformations in YAMTL using the propagation procedure presented above, we have used the VIATRA CPS benchmark [27]. The transformation *YAMTL-incr* implemented for our model transformation engine passes the sanity checks of the benchmark. The software artifacts used in this section and the results obtained are publicly available in a GitHub repository [7] and YAMTL is available at https://yamtl. github.io/.

This evaluation is an extension of the one performed for the batch component of the VIATRA CPS benchmark in [6]. From the original VIATRA CPS benchmark, two incremental variants of the transformation implemented with *EMF-IncQuery* have been selected: *ExplicitTraceability* (EXPL) [25] and *QueryResultTraceability* (QRT) [26], out of which the first one is the best performing solution up to now. These transformations have been extracted as independent Java projects. Classes implementing them have been kept intact in the new projects, including their namespaces, so that errors are not introduced due to lack of expertise. Although these two transformations produce results that are different from the other transformations, the main differences are due to reordering of multi-valued references and we have considered them valid for this evaluation. On the other hand, a benchmark measurement harness considering the best practices recommended by the VIATRA team [13] was developed in order both to fine-tune measurements and to crosscheck results. This harness removes dependencies to other components of the VIATRA CPS benchmark so that experiments can be run locally.

In the present work, we aimed at answering the following research questions: *(RQ1)* Does *YAMTL-incr* show any performance penalty w.r.t. its execution in batch mode (*YAMTL-batch*)? *(RQ2)* Does *YAMTL-incr* show any improvement in performance w.r.t *EXPL* or *QRT* during initialization phase? *(RQ3)* And during propagation phase?

From the scenarios provided in the original benchmark, the scenarios *clientserver* and *statistic based* [29] were considered. The CPS model generator [28] was used to obtain the input models to be used for the analysis so that their size depends on a logarithmic factor. The biggest models considered, in the client server scenario, consist of millions of nodes (10.16M) and edges (27.53M) and are, hence, VLMs.

For each tool and scenario, the experiments are run in isolation, i.e. in a separate Java process. For each of the input models, an initial experiment is performed to warm up the JVM and, then, twelve more experiments to measure performance. Each experiment consists of four phases: model load and engine initialization, initial transformation, delta propagation and model storage. In between each execution phase, the harness sends hints to the JVM to run garbage collection and waits for one second before proceeding on to the next phase. The first phase includes the instantiation of a fresh engine instance, avoiding interference between experiments as caches are not reused. The delta propagation phase includes the application of the delta to the source model and its propagation. Only initial transformation and delta propagation times have been considered in the quantitative analysis. For the results the median obtained for each of these two phases out of ten experiments is used, after removing the minimum and the maximum results.

In both solutions *EXPL* [25] and *QRT* [26], the delta is applied to the source model by directly modifying the resource containing the model. In the solution with YAMTL such delta was recorded and persisted using the EMF change model as described in Sect. 3.2. To analyze whether this feature could become a threat to validity, a separate experiment was run by excluding the query part of the model update (searching for the objects to be updated) in the solution *EXPL* but this change did not affect performance results perceptibly and the original solutions provided by the authors of the VIATRA CPS benchmark were considered. Therefore, the actions performed during the propagation phase are equivalent in all of the evaluated solutions.

**Fig. 3.** Performance of initialization (top) and delta propagation (bottom).

Figure 3 shows the performance results obtained both for the initial model transformation and for forward delta propagation for the models generated for the client-server scenario. Scales both for time (ms.) along Y axis and for model size factors along X axis are logarithmic allowing us to compare the scalability of the different approaches. In the initialization phase, we have included the execution of YAMTL in batch mode (*YAMTL-batch*) over the source model, and it can be seen that tracking dependencies incurs a small penalty. However, the other two solutions (*EXPL* and *QRT*) operate several orders of magnitude slower. In the propagation phase, it can be observed that while *YAMTL-incr* exhibits a constant propagation time (in μs.) for the source delta, the cost of the other solutions depends on the size of the input model. Furthermore, for the other incremental approaches, when both initial and propagation time are combined their performance worsens due to their costly initialization phase.

## **5 Related Work**

In this section, we discuss techniques used in related work for achieving incrementality in both reactive and bidirectional model transformation.

Reactive model transformation [3,21] enable the propagation of model updates from source models to target models on demand. State-of-the-art tool support relies on notification mechanisms, enabling live detection of source model updates either for immediate processing, as in VIATRA [3], or for deferred processing, as in ReactiveATL [21]. In these approaches, source model update notifications are usually fine-grained and kept in memory. Such notifications can only be detected when the transformation engine is in memory (live) as well. The use of a notification mechanism means that models are *loosely coupled* to the transformation engine. Working with offline model updates, as in the proposed delta propagation procedure, completely decouples detection of deltas from the transformation engine, freeing model update developers from the overhead of having the transformation infrastructure in memory. The latter is only needed for propagating changes but not for defining them. In reactive approaches, when an observer receives an update notification, information about the intent of the overall model delta, i.e. the contextual information relating different atomic updates, is lost. This problem is avoided using documented deltas, which may be serialized, enabling their processing − e.g. aggregating composite changes like the *move operation* − and optimization − reduction of atomic operations that are cancelled when composed. We refer the reader to [9] for an additional discussion of delta-based model updates against state-based model updates.

Among bidirectional model transformation approaches, Triple Graph Grammars (TGG), introduced in [22], are a declarative approach for specifying bidirectional consistency relations between models. Although our approach is not bidirectional, it is worth comparing how incrementality is supported in operational TGG rules. Incrementality was first introduced in TGG synchronization in [11,12]. Efficient approaches for TGG synchronization [18–20] avoid analyzing the whole model by relying on dependencies which hint at the impact of a model update directly. Precedence-based approaches [18,20] keep a binary precedence relation over the set of model elements in order to determine when creation or deletion of a model element affects another one. While [18] overestimates the actual dependencies by defining them at the type level, others underestimate them relying on user feedback [20] or on special correspondences [12]. [19] decouples impact analysis of model updates from consistency restoration by delegating the former to VIATRA's incremental pattern matcher, which has a built-in dependency tracker, and by defining operational rules using a reactive model transformation approach. However, these two phases are still tightly coupled using a synchronous communication mechanism between the incremental pattern matcher and the synchronization procedure since the pattern matcher may trigger revocations/applications of forward marking rules after revoking/applying one of them. That is, the model synchronization procedure uses the pattern matcher to know when synchronization terminates. In the delta propagation mechanism proposed in the present work, either the revocation of applied transformation steps or the creation of new transformation steps cannot trigger further applications because rule matches are computed against the source model and they are unique, that is the same match cannot enable two different rules. A new transformation step may be found when new elements are inserted in the source model. On the other hand, when a transformation step is revoked, no other rule can be applied or a conflict would have been detected when the rule was applied the first time.

Some transformation engines with support for bidirectional transformations, like NMF [14,15], support the offline representation of model deltas. However, to the best of our knowledge, none of the aforementioned approaches uses a standardized notation for them, such as the EMF model change, which can be regarded as the de-facto standard for representing model deltas in the EMF modeling tool ecosystem.

## **6 Concluding Remarks**

The main contribution of this work is the design of a delta propagation procedure for executing delta-driven model transformations, which has been implemented in YAMTL. The novelty of the approach consists in the use of a standardized representation of model deltas, which facilitates interoperability with EMFcompliant tools, and in the use of dependency injection mechanism, which allows the transformation engine to be aware of model updates without having to rely on a publish-subscribe infrastructure. The VIATRA CPS benchmark has been used to justify that *(1)* the initialization transformation in YAMTL is several orders of magnitude faster than the up-to-now fastest incremental solutions and that *(2)* propagation of sparse deltas can be performed in real time for VLMs, independently of their size, whereas other solutions show a clear dependence on their size. Hence, YAMTL shows satisfactory scalability in incremental execution of model transformations on VLMs. Additional studies with larger classes of models will be considered in future work.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# A Logic-Based Incremental Approach to Graph Repair

Sven Schneider1(B) , Leen Lambers<sup>1</sup>, and Fernando Orejas<sup>2</sup>

<sup>1</sup> Hasso Plattner Institut, University of Potsdam, Potsdam, Germany Sven.Schneider@HPI.de

<sup>2</sup> Universitat Politècnica de Catalunya, Barcelona, Spain

Abstract. Graph repair, restoring consistency of a graph, plays a prominent role in several areas of computer science and beyond: For example, in model-driven engineering, the abstract syntax of models is usually encoded using graphs. Flexible edit operations temporarily create inconsistent graphs not representing a valid model, thus requiring graph repair. Similarly, in graph databases—managing the storage and manipulation of graph data—updates may cause that a given database does not satisfy some integrity constraints, requiring also graph repair.

We present a logic-based incremental approach to graph repair, generating a sound and complete (upon termination) overview of leastchanging repairs. In our context, we formalize consistency by so-called graph conditions being equivalent to first-order logic on graphs. We present two kind of repair algorithms: State-based repair restores consistency independent of the graph update history, whereas delta-based (or incremental) repair takes this history explicitly into account. Technically, our algorithms rely on an existing model generation algorithm for graph conditions implemented in AutoGraph. Moreover, the delta-based approach uses the new concept of satisfaction (ST) trees for encoding if and how a graph satisfies a graph condition. We then demonstrate how to manipulate these STs incrementally with respect to a graph update.

## 1 Introduction

Graph repair, restoring consistency of a graph, plays a prominent role in several areas of computer science and beyond. For example, in model-driven engineering, models are typically represented using graphs and the use of flexible edit operations may temporarily create inconsistent graphs not representing a valid model, thus requiring graph repair. This includes the situation where different views of an artifact are represented by a different model, i.e., the artifact is described by a multi-model, see, e.g. [6], and updates in some models may cause a global inconsistency in the multimodel. Similarly, in graph databases—managing the storage

F. Orejas has been supported by the Salvador de Madariaga grant PRX18/00308 and by funds from the Spanish Research Agency (AEI) and the European Union (FEDER funds) under grant GRAMM (ref. TIN2017-86727-C2-1-R).

and manipulation of graph data—updates may cause that a given database does not satisfy some integrity constraints [1], requiring also graph repair.

Numerous approaches on model inconsistency and repair (see [12] for an excellent recent survey) operate in varying frameworks with diverse assumptions. In our framework, we consider a typed directed graph (cf. [7]) to be inconsistent if it does not satisfy a given finite set of constraints, which are expressed by graph conditions [8], a formalism with the expressive power of first-order logic on graphs. A graph repair is, then, a description of an update that, if applied to the given graph, makes it consistent. Our algorithms do not just provide one repair, but a set of them from which the user must select the right repair to be applied. Moreover, we derive only least changing repairs, which do not include other smaller viable repairs. Our approach uses techniques (and the tool AutoGraph) [17] designed for model generation of graph conditions.

We consider two scenarios: In the first one, the aim is to repair a given graph (state-based repair). In the second one, a consistent graph is given together with an update that may make it inconsistent. In this case, the aim is to repair the graph in an incremental way (delta-based repair).

The main contributions of the paper are the following ones:


Summarizing, most repair techniques do not provide guarantees for the functional semantics of the repair and suffer from lack of information for the deployment of the techniques (see conclusion of the survey [12]). With our logic-based graph repair approach we aim at alleviating this weakness by presenting formally its functional semantics and describing the details of the underlying algorithms.

The paper is organized as follows: After introducing preliminaries in Sect. 2, we proceed in Sect. 3 with defining graph updates and repairs. In Sect. 4, we present the state-based scenario. We continue with introducing satisfaction trees in Sect. 5 that are needed for the delta-based scenario in Sect. 6. We close with a comparison with related work in Sect. 7 and conclusion with outlook in Sect. 8. For proofs of theorems and example details we refer to our technical report [18].

## 2 Preliminaries on Graph Conditions

We recall graph conditions (GCs), defined here over typed directed graphs, used for representing properties on such graphs. In our running example<sup>2</sup>, we employ

<sup>1</sup> Note that completeness implies totality (if the given set of constraints is satisfiable by a finite graph, then the algorithms will find a repair for any inconsistent graph).

<sup>2</sup> We refer to Sect. 1 with pointers to related work including diverse use cases in Software Engineering for graph repair with more complex and motivating examples.

$$\{:E\_1 \boxdot \boxdot \underline{A}\} \xrightarrow{:E\_2} \begin{matrix} \neg E\_2 \\ \neg B \end{matrix} \qquad \neg \exists (a, \neg(\exists(a \xrightarrow{e} b, true) \land \neg \exists (a \xrightarrow{e} b, true)))\}$$

Fig. 1. The type graph *TG* (left) and the GC *<sup>ψ</sup>* (right) for our running example

the type graph *TG* from Fig. 1 and we use nodes with names a*<sup>i</sup>* and b*<sup>i</sup>* to indicate that they are of type :A and :B, respectively.

GCs state facts about the existence of graph patterns in a given graph, called a host graph. For example, in the syntax used in our running example, the GC ∃(a,*true*) means that the host graph must include a node of type :A. Also, ∃(a b,*true*) means that the host graph must include a node of type :A, another node of type :B, and an edge from the :A-node to the :B-node.

In general, in the syntax that we use in our running example, an atomic GC is of the form ∃(H, φ) (or ¬∃(H, φ)) where H is a graph that must be (or must not be) included in the host graph and where φ is a condition expressing more restrictions on how this graph is found (or not found) in the host graph. For instance, <sup>∃</sup>(a,¬∃(a b <sup>e</sup> ,*true*)) states that the host graph must include an :A-node such that it has no outgoing edge e to a :B-node. Moreover, we use the standard boolean operators to combine atomic GCs to form more complex ones. For instance, <sup>∃</sup>(a,¬(∃(a b <sup>e</sup> ,*true*) ∧ ¬∃(<sup>a</sup> e,*true*))) states that the host graph must include an :A-node, such that it does not hold that there is an outgoing edge e to a :B-node and node a has no loop. In addition, as an abbreviation for readability, we may use the universal quantifier with the meaning ∀(H, φ) = ¬∃(H,¬φ). In this sense, the condition φ from Fig. 1, used in our running example, states that every node of type :A must have an outgoing edge to a node of type :B and that such an :A-node must have no loop.

Formally, the syntax of GCs [8], expressively equivalent to first-order logic on graphs [5], is given subsequently. This logic encodes properties of graph extensions, which must be explicitly mentioned as graph inclusions. For instance, the GC <sup>∃</sup>(a,¬∃(a b <sup>e</sup> ,*true*)) in simplified notation is formally given in the syntax of GCs as <sup>∃</sup>(i*H*,¬∃(a −→ (a b <sup>e</sup> ),*true*)), where <sup>i</sup>*<sup>H</sup>* denotes the inclusion ∅ −→ H with H the graph consisting of node a. This is because it expresses a property of the extension <sup>i</sup>*H*. Moreover, therein the GC ¬∃(a −→ (a b <sup>e</sup> ),*true*) is actually a property of the extension a −→ (a b <sup>e</sup> ).

Definition 1 (Graph Conditions (GCs) [8]). *The class of* graph conditions ΦGC *<sup>H</sup> for the graph* H *is defined inductively:*

$$\begin{array}{l} - \wedge S \in \Phi\_{H}^{\mathrm{GC}} \text{ if } S \subseteq\_{\mathrm{fin}} \Phi\_{H}^{\mathrm{GC}}.\\ - \rightsquigarrow \Phi\_{H}^{\mathrm{GC}} \text{ if } \phi \in \Phi\_{H}^{\mathrm{GC}}.\\ - \exists (a \, : \, H \hookrightarrow H', \phi) \in \Phi\_{H}^{\mathrm{GC}} \text{ if } \phi \in \Phi\_{H'}^{\mathrm{GC}}. \end{array}$$

*In addition true, false,* ∨S*,* φ<sup>1</sup> ⇒ φ2*, and* ∀(a, φ) *can be used as abbreviations, with their obvious replacement.*

*A mono* <sup>m</sup> : H −→ <sup>G</sup> satisfies *<sup>a</sup>* GC <sup>ψ</sup> <sup>∈</sup> <sup>Φ</sup>GC *<sup>H</sup> , written* m |=GC ψ*, if one of the following cases applies.*

*–* ψ = ∧S *and* m |=GC φ *for each* φ ∈ S*. –* ψ = ¬φ *and not* m |=GC φ*. –* ψ = ∃(a : H −→ H , φ) *and* ∃q : H −→ G. q ◦ a = m ∧ q |=GC φ*.*

*A graph* <sup>G</sup> *satisfies a* GC <sup>ψ</sup> <sup>∈</sup> <sup>Φ</sup>GC <sup>∅</sup> *, written* <sup>G</sup> <sup>|</sup>=GC <sup>ψ</sup> *or* <sup>G</sup> <sup>∈</sup> ψ*, if* i*<sup>G</sup>* |=GC ψ*.*

## 3 Graph Updates and Repairs

In this section, we define graph updates to formalize arbitrary modifications of graphs, graph repairs as the desired graph updates resulting in repaired graphs, as well as further desireable properties of graph updates.

In particular, it is well known that a modification or update of G<sup>1</sup> resulting in a graph G<sup>2</sup> can be represented by two inclusions or, in general two monos, which we denote by (l : I −→ G1, r : I −→ G2), where I represents the part of G<sup>1</sup> that is preserved by this update. Intuitively, l : I −→ G<sup>1</sup> describes the deletion of elements from G<sup>1</sup> (i.e., all elements in G<sup>1</sup> \ l(I) are deleted) and r : I −→ G<sup>2</sup> describes the addition of elements to I to obtain G<sup>2</sup> (i.e., all elements in G<sup>2</sup> \r(I) are added).

Definition 2 (Graph Update). *A* (graph) update u *is a pair* (l : I −→ G1, r : I −→ G2) *of monos. The class of all updates is denoted by* U*.*

Graph updates such as (i*<sup>G</sup>* : ∅ −→ G, i*<sup>G</sup>* : ∅ −→ G) where G is not the empty graph delete all the elements in G that are added by r afterwards. To rule out such updates, we define an update (l : I −→ G1, r : I −→ G2) to be *canonical* when the graph I is as large as possible, i.e. intuitively I = G<sup>1</sup> ∩ G2. Formally:

Definition 3 (Canonical Graph Update). *If* (l : I −→ G1, r : I −→ G2) ∈ U *and every* (l : I −→ G1, r : I −→ G2) ∈ U *and mono* i : I −→ I *with* l ◦ i = l *and* r ◦ i = r *satisfies that* i *is an isomorphism then* (l, r) *is* canonical*, written* (l, r) ∈ Ucan*.*

An update u<sup>1</sup> is a sub-update (see [14]) of u whenever the modifications defined by u<sup>1</sup> are fully contained in the modifications defined by u. Intuitively, this is the case when u<sup>1</sup> can be composed with another update u<sup>2</sup> such that (a) the resulting update has the same effect as u and (b) u<sup>2</sup> does not delete any element that was added before by u1. This is stated, informally speaking, by requiring that I is the intersection (pullback) of I<sup>1</sup> and I<sup>2</sup> and that G<sup>2</sup> is its union (pushout).

Definition 4 (Sub-update [14]). *If* u = (l : I −→ G1, r : I −→ G2) ∈ U*,* u<sup>1</sup> = (l<sup>1</sup> : I<sup>1</sup> −→ G1, r<sup>1</sup> : I<sup>1</sup> −→ G3) ∈ U*,* u<sup>2</sup> = (l<sup>2</sup> : I<sup>2</sup> −→ G3, r<sup>2</sup> : I<sup>2</sup> −→ G2) ∈ U*,* (r <sup>1</sup> : I −→ I1, l <sup>2</sup> : I −→ I2) *is the pullback of* (r1, l2)*, and* (r1, l2) *is the pushout of* (r 1, l <sup>2</sup>) *then* <sup>u</sup><sup>1</sup> *is a* sub-update *of* <sup>u</sup>*, written* <sup>u</sup><sup>1</sup> <sup>≤</sup>*u*<sup>2</sup> <sup>u</sup> *or simply* <sup>u</sup><sup>1</sup> <sup>≤</sup> <sup>u</sup>*.*

*Moreover, we write* <sup>u</sup><sup>1</sup> <sup>&</sup>lt;*u*<sup>2</sup> <sup>u</sup> *or* <sup>u</sup><sup>1</sup> < u *when* <sup>u</sup><sup>1</sup> <sup>≤</sup>*u*<sup>2</sup> <sup>u</sup> *and not* <sup>u</sup> <sup>≤</sup> <sup>u</sup>1*.*

We now define graph repairs as graph updates where the result graph satisfies the given consistency constraint ψ.

Definition 5 (Graph Repair). *If* u = (l : I −→ G1, r : I −→ G2) ∈ U*,* ψ ∈ ΦGC <sup>∅</sup> *, and* <sup>G</sup><sup>2</sup> <sup>|</sup>=GC <sup>ψ</sup> *then* <sup>u</sup> *is a* graph repair *or simply* repair *of* <sup>G</sup><sup>1</sup> *with respect to* ψ*, written* u ∈ U(G1, ψ)*.*

To define a finite set of desirable repairs, we introduce the notion of least changing repairs that are repairs for which no sub-updates exist that are also repairs.

Definition 6 (Least Changing Graph Repair). *If* <sup>ψ</sup> <sup>∈</sup> <sup>Φ</sup>GC <sup>∅</sup> *,* <sup>u</sup> = (<sup>l</sup> : I −→ G1, r : I −→ G2) ∈ U(G1, ψ)*, and there is no* u ∈ U(G1, ψ) *such that* u < u *then* u *is a* least changing graph repair *of* G<sup>1</sup> *with respect to* ψ*, written* u ∈ Ulc(G1, ψ)*.*

Note that every least changing repair is canonical according to this definition. Moreover, the notion of least changing repairs is unrelated to other notions of repairs such as the set of all repairs that require a smallest amount of atomic modifications of the graph at hand to result in a graph satisfying the consistency constraint. For instance, a repair u<sup>1</sup> adding two nodes of type :A may be a least changing repair even if there is a repair u<sup>2</sup> adding only one node of type :B.

A graph repair algorithm is *stable* [12], if the repair procedure returns the identity update (id*<sup>G</sup>* : G −→ G, id*<sup>G</sup>* : G −→ G) when graph G is already consistent. Obviously, a graph repair algorithm that only returns least changing repairs is stable, since the identity update is a sub-update of any other repair.

## 4 State-Based Repair

In this section, we introduce two state-based graph repair algorithms (see [18] for additional technical detail), which compute a set of graph repairs restoring consistency for a given graph.

Definition 7 (State-Based Graph Repair Algorithm). *A state-based graph repair algorithm takes a graph* <sup>G</sup> *and a* GC <sup>ψ</sup> <sup>∈</sup> <sup>Φ</sup>GC <sup>∅</sup> *as inputs and returns a set of graph repairs in* U(G, ψ)*.*

Note that the tool AutoGraph [17] can be used to verify this condition as follows: It determines the operation A that constructs a finite set of all minimal graphs satisfying a given GC ψ. Formally, A(ψ) = ∩{S ⊆ ψ | ∀G ∈ ψ. ∃G ∈ S. ∃m : G −→ G .*true*}. While AutoGraph may not terminate when computing this operation due to the inherent expressiveness of GCs, it is known that AutoGraph terminates whenever ψ is not satisfied by any graph.

The state-based algorithm Repairsb*,*<sup>1</sup> uses A to obtain repairs. Repairsb*,*<sup>1</sup> computes the set A(ψ ∧ ∃(i*G*,*true*)) that contains all minimal graphs that (a) satisfy ψ and (b) include a copy of G. All these extensions of G correspond to a graph repair. For our running example, we do not obtain any repair for graph **G <sup>u</sup>** from Fig. 2 and GC *ψ* from Fig. 1 because the loop on node a<sup>2</sup> would invalidate any graph including **G <sup>u</sup>**. We state that Repairsb*,*<sup>1</sup> indeed computes the non-deleting least changing graph repairs.

Theorem 1 (Functional Semantics of Repairsb*,*1). Repairsb*,*<sup>1</sup> *is* sound*, i.e.,* Repairsb*,*1(G, ψ) ⊆ Ulc(G, ψ)*, and* complete (upon termination) *with respect to non-deleting repairs in* Ulc(G, ψ)*.*

The second state-based algorithm Repairsb*,*<sup>2</sup> computes *all* least changing graph repairs. In this algorithm we use the approach of Repairsb*,*<sup>1</sup> but compute A(ψ ∧ ∃(i*<sup>G</sup><sup>c</sup>* ,*true*)) whenever an inclusion l : G*<sup>c</sup>* −→ G describes how G can be restricted to one of its subgraphs G*c*. Every graph G obtained from the application of A for one of these graphs G*<sup>c</sup>* then results in one graph repair returned by Repairsb*,*<sup>2</sup> except for those that are not least changing.

To this extent we introduce the notion of a restriction tree (see example in Fig. 2) having all subgraphs G*<sup>c</sup>* of a given graph G as nodes as long as they include the graph G*min*, which is the empty graph in the state-based algorithm Repairsb*,*<sup>2</sup> but not in the algorithm Repairdb in Sect. 6, and where edges are given in this tree by inclusions that add precisely one node or edge.

Definition 8 (Restriction Tree RT). *If* G *and* G*min are graphs and* S = {l : G*<sup>c</sup>* −→ G*<sup>p</sup>* | G*min* ⊆ G*<sup>c</sup>* ⊂ G*<sup>p</sup>* ⊆ G, l *is an inclusion*}*,* S *is the least subset of* S *such that the closure of* S *under* ◦ *equals* S *then a* restriction tree RT(G, G*min*) *is a least subset of* S *such that for all two inclusions* l<sup>1</sup> : G −→ G<sup>1</sup> ∈ S *and* l<sup>2</sup> : G −→ G<sup>2</sup> ∈ S *one of them is in* RT(G, G*min*)*.*

Considering our running example, the restriction tree in Fig. 2 is traversed entirely except for the four graphs without a border, which are not traversed as they have the supergraph marked 9 satisfying *ψ* and therefore traversing those would generate repairs that are not least changing. The resulting graph repairs for the condition *ψ* are given by the graphs marked by 3–6.

Our second state-based graph repair algorithm is indeed sound and complete whenever the calls to AutoGraph using A terminate.

Theorem 2 (Functional Semantics of Repairsb*,*2). Repairsb*,*<sup>2</sup> *is* sound*, i.e.,* Repairsb*,*2(G, ψ) ⊆ Ulc(G, ψ)*, and* complete*, i.e.,* Ulc(G, ψ) ⊆ Repairsb*,*2(G, ψ)*, upon termination.*

## 5 Satisfaction Trees

The state-based algorithms introduced in the previous section are inefficient when used in a scenario where a graph needs repair after a sequence of updates

Fig. 2. The restriction tree RT(**G**- **<sup>u</sup>***,* ∅) (enclosed by the polygon) and four graph repairs (marked 3–6) generated using Repairsb*,*<sup>2</sup>

that all need repair. We thus present in Sect. 6 an incremental algorithm reducing the computational cost for a repair when an update is provided. This algorithm uses an additional data structure, called *satisfaction tree* or ST, which stores information on if and how a graph G satisfies a GC ψ (according to Definition 1). In this section, given ψ and G, we define how such an ST γ is constructed and how it is updated once the graph G is updated.

If ψ is a conjunction of conditions, its associated ST γ is a conjunction of STs and if ψ is a negation of a conditions, its associated γ is a negation of an ST. In the case when ψ is a ∃(a : H −→ H , φ), recall that a match m : H −→ G satisfies ψ if there exists a q : H −→ G such that m = q ◦a and q |=GC φ. For this case, we keep in ST each q satisfying these two conditions and also each q that satisfies the first condition, but not the second. More precisely, for the case of existential quantification, the corresponding ST is of the form ∃(a : H −→ H , φ, m*t*, m*<sup>f</sup>* ), where m*<sup>t</sup>* and m*<sup>f</sup>* are partial mappings (we use sup(f) to denoted the elements actually mapped by a partial map f) that map matches q : H −→ G that satisfy m = q ◦ a (for a previously known m : H −→ G) to an ST for the subcondition φ. The difference between both partial functions is that m*<sup>t</sup>* maps matches q to STs for which q |=GC φ while m*<sup>f</sup>* maps matches q to STs for which q |=GC φ. Consider Fig. 3b for an example of an ST γ**u**.

The following definition describes the syntax of STs. The STs are defined over matches into a graph G to allow for the basic well-formedness condition that every mapped match q satisfies q ◦ a = m.

Definition 9 (Satisfaction Trees (STs)). *The class of all* Satisfaction Trees ΓST *<sup>m</sup> for a mono* m : H −→ G *contains* γ *if one of the following cases applies.*

*–* <sup>γ</sup> <sup>=</sup> <sup>∧</sup><sup>S</sup> *and* <sup>S</sup> <sup>⊆</sup>fin <sup>Γ</sup>ST *<sup>m</sup> . –* <sup>γ</sup> <sup>=</sup> <sup>¬</sup><sup>χ</sup> *and* <sup>χ</sup> <sup>∈</sup> <sup>Γ</sup>ST *<sup>m</sup> . –* γ = ∃(a, φ, m*t*, m*<sup>f</sup>* )*,* a : H −→ H *,* <sup>φ</sup> <sup>∈</sup> <sup>Φ</sup>GC *H*- *,* m*t*, m*<sup>f</sup>* ⊆fin {(q : H −→ G, γ¯) | <sup>q</sup> ◦ <sup>a</sup> <sup>=</sup> m, <sup>γ</sup>¯ <sup>∈</sup> <sup>Γ</sup>ST *<sup>q</sup>* }*, and* m*t*, m*<sup>f</sup> are partial maps.*

Fig. 3. A graph update and an ST with its propagation over the graph update where GCs are underlined in STs for readability

The following satisfaction predicate |=GC for STs defines when an ST γ for a mono m states that the contained GC ψ is satisfied by the morphism m.

Definition 10 (ST Satisfaction). *An* ST <sup>γ</sup> <sup>∈</sup> <sup>Γ</sup>ST *<sup>m</sup>*:*<sup>H</sup>*−→*<sup>G</sup> is* satisfied*, written* |=ST γ*, if one of the following cases applies.*

*–* γ = ∧S *and* |=ST χ *(for each* χ ∈ S*) –* γ = ¬χ *and* |=ST χ*. –* γ = ∃(a, φ, m*t*, m*<sup>f</sup>* ) *and* m*<sup>t</sup>* = ∅*.*

The following recursive operation constructs an ST γ for a graph G and a condition ψ so that γ represents how G satisfies (or not satisfies) ψ. Note that the match m in the definition of STs above and the construction of an ST below corresponds to the match m : H −→ G from Definition 1 that we operationalize in the following definition. For conjunction and negation, we construct the STs from the STs for the subconditions. For the case of existential quantification, we consider all morphisms q : H −→ G for which the triangle q ◦ a = m commutes and construct the STs for the subcondition φ under this extended match q. The resulting STs are inserted into m*<sup>t</sup>* and m*<sup>f</sup>* according to whether they are satisfied.

Definition 11 (Construct ST *(*cst*)*). *Given* <sup>m</sup> : H −→ <sup>G</sup> *and* <sup>ψ</sup> <sup>∈</sup> <sup>Φ</sup>GC *<sup>H</sup> , we define* cst(ψ,m) = <sup>γ</sup>*, with* <sup>γ</sup> <sup>∈</sup> <sup>Γ</sup>ST *<sup>m</sup> as follows.*


*If* <sup>G</sup> *is a graph and* <sup>ψ</sup> <sup>∈</sup> <sup>Φ</sup>GC <sup>∅</sup> *, then* cst(ψ,G) = cst(ψ, <sup>i</sup>*G*)*.*

This construction of STs then ensures that |=ST γ if and only if G |=GC ψ. Note that |=ST γ**<sup>u</sup>** holds for the ST γ**<sup>u</sup>** from Fig. 3b, the GC *ψ* from Fig. 1, and the graph **G<sup>u</sup>** from Fig. 3.

Theorem 3 (Sound Construction of STs). *Given* <sup>m</sup> : H −→ <sup>G</sup>*,* <sup>ψ</sup> <sup>∈</sup> <sup>Φ</sup>GC *<sup>H</sup> , and* cst(ψ,m) = γ *then* |=ST γ *iff* m |=GC ψ*.*

Subsequently, we define a propagation operation ppgU of an ST γ for a graph update u = (l : I −→ G, r : I −→ G ) to obtain an ST γ such that γ = cst(ψ,G ) whenever γ = cst(ψ,G). This overall propagation is performed by a backward propagation of γ for l using the operation ppgB followed by a forward propagation of the resulting ST for r using the operation ppgF.

For backward propagation, we describe how the deletion of elements in G by l : I → G affect its associated ST γ. To this end, we preserve those matches q : H −→ G for which no matched elements are deleted. This is formalized by requiring a mono q : H −→ I such that l ◦ q = q. The matches q with deleted matched elements can not be preserved and are therefore removed.

Definition 12 (Propagate Match *(*ppgMatch*)*). *If* q : H −→ G *and* l : I −→ G *are monos, then* ppgMatch(q,l) *is the unique* q : H −→ I *such that* l ◦ q = q *if it exists and* ⊥ *otherwise.*

The following recursive backward propagation defines how deletions affect the maps m*<sup>t</sup>* and m*<sup>f</sup>* of the given ST. That is, when γ = ∃(a, φ, m*t*, m*<sup>f</sup>* ), we (a) entirely remove a mapping (m, χ) from m*<sup>t</sup>* or m*<sup>f</sup>* if ppgMatch(q,l) = ⊥ and (b) construct for a mapping (m, χ) from m*<sup>t</sup>* or m*<sup>f</sup>* the pair (ppgMatch(q,l), χ ) where χ is obtained from recursively applying the backward propagation on χ when ppgMatch(q,l) = ⊥. The updated pair (ppgMatch(q,l), χ ) must be rechecked to decide to which partial map this pair must be added to ensure that the resulting ST corresponds to the ST that would be constructed for G directly. Definition 13 (Backward Propagation *(*ppgB*)*). *If* <sup>m</sup> : H −→ <sup>G</sup>*,* <sup>γ</sup> <sup>∈</sup> <sup>Γ</sup>ST *<sup>m</sup> ,* <sup>l</sup> : I −→ <sup>G</sup>*,* ppgMatch(m, l) = <sup>m</sup> : H −→ <sup>I</sup>*, and* <sup>γ</sup> <sup>∈</sup> <sup>Γ</sup>ST *m then* ppgB(γ,l) = γ *if one of the following cases applies.*

$$\begin{array}{l} \neg \gamma = \land S \text{ and } \gamma' = \land \{\text{ppgB}(\chi, l) \mid \chi \in S\}. \\\neg \neg \gamma = \neg \chi \text{ and } \gamma' = \neg \text{ppgB}(\chi, l). \\\neg \neg \overline{\neg} = \exists (a, \phi, m\_t, m\_f), \; m\_{all} = \{ (q', \chi') \mid (q, \chi) \in m\_t \cup m\_f \land \text{ppgMatch}(q, l) = \neg \text{ } q' \neq \bot \land \text{ppgB}(\chi, l) = \chi' \}. \\\ \neg \neg \text{ } \land \text{ppgB}(\chi, l) = \chi' \}, \; m'\_t = \{ (q, \chi) \in m\_{all} \; | \lvert \neg \text{ST} \; \chi \}, \; m'\_f = m\_{all} \; \mid \; m'\_t, f \\\ \text{and } \gamma' = \exists (a, \phi, m'\_t, m'\_f). \end{array}$$

Note that ppgMatch(i*G*, l)=i*<sup>G</sup>* and, hence, the operation ppgB is applicable for all ST <sup>γ</sup> <sup>∈</sup> <sup>Γ</sup>ST <sup>i</sup>*<sup>G</sup>* , which is sufficient as we define consistency constraints using GCs over the empty graph as well.

In the case of forward propagation where additions are given by r : I −→ G we can preserve all matches using an adaptation. But the addition of further elements may result in additional matches as well that may satisfy the conditions to be included in the corresponding m*<sup>t</sup>* and m*<sup>f</sup>* from the ST at hand.

Definition 14 (Forward Propagation *(*ppgF*)*). *If* <sup>γ</sup> <sup>∈</sup> <sup>Γ</sup>ST *<sup>m</sup>*:*<sup>H</sup>*−→*<sup>I</sup> ,* <sup>r</sup> : I −→ <sup>G</sup> *, and* <sup>γ</sup> <sup>∈</sup> <sup>Γ</sup>ST *<sup>r</sup>*◦*<sup>m</sup> then* ppgF(γ,r) = γ *if one of the following cases applies.*

$$\begin{array}{l} \neg \gamma = \land S \text{ and } \gamma' = \land \{\text{ppgF}(\chi, r) \mid \chi \in S\}. \\\neg \neg \gamma = \neg \chi \text{ and } \gamma' = \neg \text{ppgF}(\chi, r). \\\neg \neg \exists (a, \phi, m\_t, m\_f), m\_{all} = \{ (r \circ q, \gamma') \mid (q, \chi) \in m\_t \cup m\_f \land \text{ppgF}(\chi, r) = \gamma' \} \cup \\\{ (q, \gamma\_q) \mid q \circ a = r \circ m, (\exists q' \in \text{sup}(m\_t) \cup \text{sup}(m\_f). r \circ q' = q), \text{cst}(\phi, q) = \gamma\_q \}, \\\ m'\_t = \{ (q, \chi) \in m\_{all} \mid \text{|} \neg \text{st} \text{ } \chi \}, \ m'\_f = m\_{all} \mid m'\_t, \text{ and } \gamma' = \exists (a, \phi, m'\_t, m'\_f). \end{array}$$

We now define the composition of both propagations to obtain the operation ppgU that updates an ST for an entire graph update.

Definition 15 (Update Propagation *(*ppgU*)*). *If* <sup>m</sup> : H −→ <sup>G</sup>*,* <sup>γ</sup> <sup>∈</sup> <sup>Γ</sup>ST *<sup>m</sup> ,* l : I −→ G*,* ppgMatch(m, l) = m : H −→ G *, and* r : I −→ G *then* ppgU(γ,(l, r)) = ppgF(ppgB(γ,l), r) <sup>∈</sup> <sup>Γ</sup>ST *m*-*.*

The overall propagation given by this operation is *incremental*, in the sense that the operation cst is only used in the forward propagation on parts of the graph G , where the addition of graph elements by r from the graph update results in additional matches q according to the satisfaction relation for GCs. Finally, we state that ppgU incrementally computes the ST obtained using cst. The proof of this theorem relies on the fact that this property also holds for ppgB and ppgF.

Theorem 4 (ppgU is Compatible with cst). *If* <sup>G</sup> *is a graph,* <sup>ψ</sup> <sup>∈</sup> <sup>Φ</sup>GC <sup>∅</sup> *,* l : I −→ G*, and* r : I −→ G *then* ppgU(cst(ψ,G),(l, r)) = cst(ψ,G )*.*

## 6 Delta-Based Repair

The local states of delta-based graph repair algorithms may contain, besides the current graph as in state-based graph repair algorithms, an additional value. In our delta-based graph repair algorithm this will be an ST.

$$\mathcal{G}'\_{a\_1} \xrightarrow{\mathcal{G}'\_{a\_1}} \bigoplus\_{\begin{subarray}{c} a\_1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1 \ \rightsquigarrow 1$$

Fig. 4. An example for delta-based graph repair using <sup>R</sup>epairdb

Definition 16 (Delta-Based Graph Repair Algorithm). *Delta-based graph repair algorithms take a graph* <sup>G</sup>*, a* GC <sup>ψ</sup> <sup>∈</sup> <sup>Φ</sup>GC <sup>∅</sup> *, and a value* <sup>q</sup> *as inputs and return a set of pairs* (u, q ) *where* u ∈ U(G, ψ) *is a graph repair and* q *is a value.*

Our delta-based graph repair algorithm Repairdb will be based on the single step operation <sup>R</sup>epairdb1. Given a graph <sup>G</sup>, a GC <sup>ψ</sup> <sup>∈</sup> <sup>Φ</sup>GC <sup>∅</sup> , the ST <sup>γ</sup> that equals cst(ψ,G), and a graph update u = (l : I −→ G, r : I −→ G ), the single step operation Repairdb first updates γ using ppgU for the graph update u and then determines using Repairdb1, if necessary, graph repairs for the resulting ST γ according to the repair rules described in the following. The algorithm Repairdb then uses Repairdb1 in a breadth first manner to obtain multi-step repairs.

For our example from Fig. 3a, such a multi-step repair of **G <sup>u</sup>** is given in Fig. 4 where the graph updates are obtained resulting in the graphs marked 1–3, of which only the graph marked 1 satisfies *ψ*. The algorithm Repairdb then computes further graph updates resulting in the graph marked 4 also satisfying *ψ*.

The operation Repairdb1 for deriving single-step repairs depends on two local modifications. Firstly, a GC ∃(a : H −→ H , φ) occurring as a subcondition in the consistency constraint ψ may be violated because, for the match m : H −→ G that locates a copy of H in the graph G under repair, no suitable match q : H −→ G can be found for which q ◦ a = m and q |=GC φ are satisfied. The operation Repairadd resolves this violation by (a) using AutoGraph to construct a suitable graph H*<sup>s</sup>* and by (b) integrating this graph H*<sup>s</sup>* into G resulting in G such that a suitable match q : H −→ G can be found.

Definition 17 (Local Addition Operation Repairadd). *If* a : H −→ H *,* φ ∈ ΦGC *H*- *,* m : H −→ G*,* H*<sup>s</sup>* ∈ A(∃(i*H*, ∃(a, φ)))*,* k : H −→ H*s, and* ( ¯m : H*<sup>s</sup>* −→ G , r : G −→ G ) *is the pushout of* (m, k) *then* r ∈ Repairadd(a, φ, m)*.*

$$H' \xleftarrow{a} \xrightarrow{a} H \xleftarrow{k} H\_{\s} \xrightarrow{H\_{s}} \mathcal{I}\_{\bar{m}}$$

In our running example, Repairadd determines a graph repair resulting in the graph marked 2 in Fig. 4. For this repair, we considered the sub-ST marked by (R2) in Fig. 3d, where the morphism m matches the node a from *ψ* to the node a<sup>2</sup> in **G <sup>u</sup>**, but where no extension of m can also match a node :B and an edge between these two nodes. The repair performed then uses a b e for the graph H*s*, resulting in the addition of the node b<sup>2</sup> and the edge from a<sup>2</sup> to b2.

Secondly, a GC ∃(a : H −→ H , φ) occurring as a subcondition in the consistency constraint ψ may be satisfied even though it should not when occurring underneath some negation. Such a violation is determined, again for a given match m : H −→ G, by some match q : H −→ G satisfying q ◦ a = m and q |=GC φ. The local repair operation Repairdel repairs such an undesired satisfaction by selecting a graph H*<sup>p</sup>* such that H ⊆ H*<sup>p</sup>* ⊂ H using a restriction tree (see Definition 8) and deleting G*del* = q(H ) \ q(H*p*) from G. Technically, we can not use the pushout complement of a and q as it does not exists when edges from G \ G*del* are attached to nodes in G*del* . Hence, we determine the pushout complement of a and k , which must be constructed for this purpose suitably.

Definition 18 (Local Deletion Operation Repairdel ). *If* a : H −→ H *,* q : H −→ G*,* a : H*<sup>p</sup>* −→ H ∈ RT(H , H)*,* m<sup>1</sup> : H −→ X<sup>2</sup> *where* X<sup>2</sup> *is obtained from* q(H ) *by adding all edges (with their nodes) that are connected to nodes in* q(H ) \ q(a (H*p*))*,* k : X<sup>2</sup> −→ G *is obtained such that* k ◦m<sup>1</sup> = q*,* m<sup>2</sup> : H*<sup>p</sup>* −→ X<sup>1</sup> *where* X<sup>1</sup> *is obtained from* H*<sup>p</sup> by adding all nodes in* X<sup>2</sup> \ q(H )*,* a : X<sup>1</sup> −→ X<sup>2</sup> *is obtained such that* a ◦ m<sup>2</sup> = m<sup>1</sup> ◦ a *, and* (l : G −→ G, m : X<sup>1</sup> −→ G ) *is the pushout complement of* (a, k ) *then* l ∈ Repairdel(a, q)*.*

In our example, Repairdel determines a repair resulting in the graph marked 1 in Fig. 4. For this repair, we considered the sub-ST marked by (R1) in Fig. 3d where the mono m matches the node a from *ψ* to the node a<sup>2</sup> in **G <sup>u</sup>**. The repair performed then uses H*<sup>p</sup>* = ∅ for the removal of the node a<sup>2</sup> along with its adjacent loop (for which the technical handling in Repairdel is required).

The recursive operation Repairdb1 below derives updates from an ST γ that corresponds to the current graph G (for our running example, these are γ **u** and **G <sup>u</sup>** from Fig. 3d). In the algorithm Repairdb, we apply Repairdb1 for the initial match i*G*, γ, and *true* where this boolean indicates that we want γ to be satisfied. This boolean is changed in Rule 3 whenever the recursion is applied to an ST ¬γ because we expect that γ is not to be satisfied iff we expect that ¬γ is to be satisfied. For conjunction, we either attempt to repair a sub-ST for b = *true* in Rule 1 or we attempt to break one sub-ST for b = *false*. For existential quantification and b = *true*, we use Repairadd as discussed before in Rule 4 or we attempt to repair one existing match contained in m*<sup>f</sup>* in Rule 5. Also, for existential quantification and b = *false*, we use Repairdel as discussed before in Rule 6 or we attempt to break one existing match contained in m*<sup>t</sup>* in Rule 7.

Definition 19 (Single-Step Delta-Based Repair Algorithm Repairdb1). *If* <sup>m</sup> : H −→ <sup>G</sup>*,* <sup>γ</sup> <sup>∈</sup> <sup>Γ</sup>ST *<sup>m</sup> , and* b ∈ **B** *then* (l : I −→ G, r : I −→ G ) ∈ Repairdb1(m, γ, b) *if one of the following cases applies.*


We define the recursive algorithm Repairdb to apply Repairdb1 to obtain repairs as iterated applications of single-step repairs computed by Repairdb1.

Definition 20 (Delta-Based Repair Algorithm Repairdb ). *If* u = (l : I −→ G, r : I −→ G ) ∈ U*,* <sup>γ</sup> <sup>∈</sup> <sup>Γ</sup>ST <sup>i</sup>*<sup>G</sup> , and* γ = ppgU(γ,u) *then* Repairdb(u, γ) = S *if one of the following cases applies.*


This computation does not terminate when repairs trigger each other ad infinitum. However, a breadth-first-computation of Repairdb gradually computes a set of sound repairs. Obviously, GCs that trigger such nonterminating computations should be avoided but machinery for detecting such GCs is called for.

Note that the algorithm Repairdb computes fewer graph repairs compared to Repairsb*,*<sup>2</sup> because repairs are applied locally in the scope defined by the GC ψ. For example, no repair would be constructed resulting in the graph marked 4 in Fig. 2. In general, explicitly also using bigger contexts in ψ results in the additional computation of less–local graph repairs. For example, the condition *<sup>ψ</sup>* may be rephrased into *<sup>ψ</sup>* <sup>=</sup> *<sup>ψ</sup>*∧¬∃(a b,¬∃(a b <sup>e</sup> ,*true*)) to also obtain the graph repair marked 4 in Fig. 2. We now define the updates, which we expect to be computed by Repairdb1, as those that repair a single violation of the GC ψ by defining a local update to be embeddable into the resulting update via a double pushout diagram as in the DPO approach to graph transformation [16].

Definition 21 (Locally Least Changing Graph Update). *If* G<sup>1</sup> *is a graph,* <sup>ψ</sup> <sup>∈</sup> <sup>Φ</sup>GC <sup>∅</sup> *,* <sup>G</sup><sup>1</sup> |=GC <sup>ψ</sup>*,* (<sup>l</sup> : I −→ <sup>G</sup>1, r : I −→ <sup>G</sup>2) ∈ Ulc(G1, ψ)*,* <sup>G</sup><sup>2</sup> <sup>|</sup>=GC <sup>ψ</sup>*,* <sup>X</sup><sup>1</sup> *is a minimal subgraph of* G<sup>1</sup> *with a violation of* ψ *that is also a violation of* ψ *in*

<sup>3</sup> If *<sup>u</sup>*<sup>1</sup> and *<sup>u</sup>*<sup>2</sup> are updates then *<sup>u</sup>*<sup>1</sup> ◦ *<sup>u</sup>*<sup>2</sup> <sup>=</sup> *<sup>u</sup>* if *<sup>u</sup>*<sup>1</sup> <sup>≤</sup>*<sup>u</sup>*<sup>2</sup> *<sup>u</sup>* or *<sup>u</sup>* <sup>=</sup> <sup>⊥</sup> otherwise (see Definition 4).

G*, and the diagram below exists and the right part of it is a DPO diagram then* (l, r) *is a* locally least changing graph update*.*

$$\begin{array}{c} X\_1 \hookrightarrow\_{I'} \hookrightarrow X\_2\\ \int \coprod\_{I} \int\_{I} \begin{array}{c} \int \\ \stackrel{r}{\hookrightarrow} G\_2 \end{array} \end{array}$$

Repairdb1 indeed generates such locally least changing graph updates because the graph X<sup>1</sup> in this definition corresponds to the H<sup>1</sup> and the H<sup>2</sup> from an ST ∃(a : H<sup>1</sup> −→ H2, φ, m*t*, m*<sup>f</sup>* ) that is subject to Repairadd and Repairdel, respectively. For example, for Repairadd, the graph H<sup>1</sup> in the ST determines a subgraph in G<sup>1</sup> that is a violation of the overall consistency condition given by a GC ψ as its match can not be extended to the graph H2.

We now define the locally least changing graph repairs (which are to be computed by Repairdb such as for example the graphs marked 1 and 4 in Fig. 4) as the composition of a sequence of locally least changing updates where precisely the last graph update results in a graph satisfying the GC ψ.

Definition 22 (Locally Least Changing Graph Repair). *If* G<sup>1</sup> *is a graph,* <sup>ψ</sup> <sup>∈</sup> <sup>Φ</sup>GC <sup>∅</sup> *,* <sup>π</sup> = (l<sup>1</sup> : <sup>I</sup><sup>1</sup> −→ <sup>G</sup>1, r<sup>1</sup> : <sup>I</sup><sup>1</sup> −→ <sup>G</sup>2)...(l*<sup>n</sup>* : <sup>I</sup>*<sup>n</sup>* −→ <sup>G</sup>*n*, r*<sup>n</sup>* : <sup>I</sup>*<sup>n</sup>* −→ <sup>G</sup>*<sup>n</sup>*+1) *is a sequence of locally least changing graph updates,* G<sup>1</sup> ∈ ψ *implies* n = 0 *and* l<sup>1</sup> = r<sup>1</sup> = id*<sup>G</sup>*<sup>1</sup> *,* G*<sup>i</sup>* ∈/ ψ *(for each* 2 ≤ i ≤ n*),* G*<sup>n</sup>*+1 ∈ ψ*,* (l, r) *is the iterated composition of the updates in* π*, and* (l, r) ∈ U(G1, ψ) *is a least changing graph repair then* (l, r) *is a* locally least changing graph repair*.*

We now state that our delta-based graph repair algorithm Repairdb returns all desired locally least changing graph repairs upon termination.

Theorem 5 (Functional Semantics of Repairdb ). Repairdb *is sound (i.e., it generates only locally least changing graph repairs) and complete (upon termination) with respect to locally least changing graph repairs.*

The state-based algorithms Repairsb*,*<sup>1</sup> and Repairsb*,*<sup>2</sup> are inappropriate in environments where numerous updates that may invalidate consistency are applied to a large graph because the procedure of AutoGraph has exponential cost. The incremental delta-based algorithm Repairdb is a viable alternative when additional memory requirements for storing the ST are acceptable. The AutoGraph applications for this algorithm have negligible costs because they may be performed a priori and must only be performed for subconditions of the consistency constraint, which can be assumed to feature reasonably small graphs only.

Finally, a classification of locally least changing repairs is useful for userbased repair selection. Delta preserving repairs defined below represent such a basic class, containing only those repairs that preserve the update resulting in a graph not satisfying GC ψ, i.e., it may be desirable to avoid repairs that revert additions or deletions of this update. In our example, the repair related to the graph marked 4 in Fig. 4 is not delta preserving w.r.t. **u** from Fig. 3a.

Definition 23 (Delta Preserving Graph Repair). *If* <sup>ψ</sup> <sup>∈</sup> <sup>Φ</sup>GC <sup>∅</sup> *,* <sup>u</sup><sup>2</sup> = (l<sup>2</sup> : I<sup>2</sup> −→ G2, r<sup>2</sup> : I<sup>2</sup> −→ G3) ∈ U(G2, ψ) *is a graph repair,* u<sup>1</sup> = (l<sup>1</sup> : I<sup>1</sup> −→ G1, r<sup>1</sup> : <sup>I</sup><sup>1</sup> −→ <sup>G</sup>2) *is a graph update, and there exists a graph update* <sup>u</sup> *such that* <sup>u</sup><sup>1</sup> <sup>&</sup>lt;*u*<sup>2</sup> <sup>u</sup> *then* u<sup>2</sup> *is a* delta preserving graph repair *with respect to* u1*.*

## 7 Related Work

According to the recent survey on *model repair* [12], and the corresponding exhaustive classification of primary studies selected in the literature review, published online [11], we can see that the amount and wide variety of existing approaches makes a detailed comparison with all of them infeasible.

We consider our approach to be innovative, not only because of the proposed solutions, but because it addresses the issues of *completeness* and *least changing* for incremental graph repair in a precise and formal way. From the survey [11,12] we can see that only two other approaches [10,19] address completeness and least changing, relying also on constraint-solving technology. The main difference with our approach is that they are not incremental. In particular, the work of Schoenboeck et al. [19] proposes a logic programming approach allowing the exploration of model repair solutions ranked according to some quality criteria, re-establishing conformance of a model with its metamodel. Soundness and completeness of these repair actions is not formally proven. Moreover, the least changing bidirectional model transformation approach of Macedo et al. [10] has only a bounded search for repairs, relying on a bounded constraint solver.

Some *recent work* on rule-based *graph repair* [9] (not covered by the survey) addresses the least-changing principle by developing so-called maximally preserving (items are preserved whenever possible) repair programs. This state-based approach considers a subset of consistency constraints (up to nesting depth 2) handled by our approach, and is not complete, since it produces repairs including only a minimal amount of deletions. Some other recent rule-based graph repair approach [13,20] (also not covered by the survey) proposes so-called change preserving repairs (similar to what we define as delta-preserving). The main difference with our work is that we do not require the user to specify consistencypreserving operations from which repairs are generated, since we derive repairs using constraint solving techniques directly from the consistency constraints.

Finally, there is a variety of work on *incremental evaluation of graph queries* (see e.g. [2,4]), developed with the aim of efficiently re-evaluating a graph query after an update has been performed. Although not employed with the specific aim of complete and least changing graph repair, this work is related to our newly introduced concept of satisfaction trees, also using specific data structures to record with some detail the set of answers to a given query (as described for graph conditions, for example, also in [3]). It is part of ongoing work to evaluate how STs can be employed similarly in this field of incremental query evaluation.

## 8 Conclusion and Future Work

We presented a logic-based incremental approach to graph repair. It is the first approach to graph repair returning a sound and complete overview of least changing repairs with respect to graph conditions equivalent to first-order logic on graphs. Technically, it relies on an existing model generation procedure for graph conditions together with the newly introduced concept of satisfaction trees, encoding if and how a graph satisfies a graph condition.

As future work, we aim at supporting partial consistency and gradually improving it. We are confident that we can extend our work to support attributes, since our underlying model generation procedure supports it. Ongoing work is the support of more expressive consistency constraints, allowing path-related properties. Moreover, we are in the process of implementing the algorithms presented here and evaluating them on a variety of case studies. The evaluation also pertains to the overall efficiency (for which we employ techniques for localized pattern matching) and includes a comparison with other approaches for graph repair. Finally, we aim at presenting new and refined properties distinguishing between all possible repairs supporting the implementation of interactive repair selection procedures.

## References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Software Verification II

# **DeepFault: Fault Localization for Deep Neural Networks**

Hasan Ferit Eniser1(B) , Simos Gerasimou<sup>2</sup> , and Alper Sen<sup>1</sup>

<sup>1</sup> Bogazici University, Istanbul, Turkey {hasan.eniser,alper.sen}@boun.edu.tr <sup>2</sup> University of York, York, UK simos.gerasimou@york.ac.uk

**Abstract.** Deep Neural Networks (DNNs) are increasingly deployed in safety-critical applications including autonomous vehicles and medical diagnostics. To reduce the residual risk for unexpected DNN behaviour and provide evidence for their trustworthy operation, DNNs should be thoroughly tested. The DeepFault whitebox DNN testing approach presented in our paper addresses this challenge by employing suspiciousness measures inspired by fault localization to establish the hit spectrum of neurons and identify suspicious neurons whose weights have not been calibrated correctly and thus are considered responsible for inadequate DNN performance. DeepFault also uses a suspiciousness-guided algorithm to synthesize new inputs, from correctly classified inputs, that increase the activation values of suspicious neurons. Our empirical evaluation on several DNN instances trained on MNIST and CIFAR-10 datasets shows that DeepFault is effective in identifying suspicious neurons. Also, the inputs synthesized by DeepFault closely resemble the original inputs, exercise the identified suspicious neurons and are highly adversarial.

**Keywords:** Deep Neural Networks · Fault localization · Test input generation

## **1 Introduction**

Deep Neural Networks (DNNs) [33] have demonstrated human-level capabilities in several intractable machine learning tasks including image classification [10], natural language processing [56] and speech recognition [19]. These impressive achievements raised the expectations for deploying DNNs in real-world applications, especially in safety-critical domains. Early-stage applications include air traffic control [25], medical diagnostics [34] and autonomous vehicles [5]. The responsibilities of DNNs in these applications vary from carrying out well-defined tasks (e.g., detecting abnormal network activity [11]) to controlling the entire behaviour system (e.g., end-to-end learning in autonomous vehicles [5]).

This research was supported in part by Bogazici University; Research Fund 13662. c The Author(s) 2019

<sup>-</sup>R. H¨ahnle and W. van der Aalst (Eds.): FASE 2019, LNCS 11424, pp. 171–191, 2019. https://doi.org/10.1007/978-3-030-16722-6\_10

Despite the anticipated benefits from a widespread adoption of DNNs, their deployment in safety-critical systems must be characterized by a high degree of dependability. Deviations from the expected behaviour or correct operation, as expected in safety-critical domains, can endanger human lives or cause significant financial loss. Arguably, DNN-based systems should be granted permission for use in the public domain only after exhibiting high levels of trustworthiness [6].

Software testing is the de facto instrument for analysing and evaluating the quality of a software system [24]. Testing enables at one hand to reduce the risk by proactively finding and eliminating problems (*bugs*), and on the other hand to evidence, through using the testing results, that the system actually achieves the required levels of safety. Research contributions and advice on best practices for testing conventional software systems are plentiful; [63], for instance, provides a comprehensive review of the state-of-the-art testing approaches.

Nevertheless, there are significant challenges in applying traditional software testing techniques for assessing the quality of DNN-based software [54]. Most importantly, the little correlation between the behaviour of a DNN and the software used for its implementation means that the behaviour of the DNN cannot be explicitly encoded in the control flow structures of the software [51]. Furthermore, DNNs have very complex architectures, typically comprising thousand or millions of parameters, making it difficult, if not impossible, to determine a parameter's contribution to achieving a task. Likewise, since the behaviour of a DNN is heavily influenced by the data used during training, collecting enough data that enables exercising all potential DNN behaviour under all possible scenarios becomes a very challenging task. Hence, there is a need for systematic and effective testing frameworks for evaluating the quality of DNN-based software [6].

Recent research in the DNN testing area introduces novel white-box and black-box techniques for testing DNNs [20,28,36,37,48,54,55]. Some techniques transform valid training data into adversarial through mutation-based heuristics [65], apply symbolic execution [15], combinatorial [37] or concolic testing [55], while others propose new DNN-specific coverage criteria, e.g., neuron coverage [48] and its variants [35] or MC/DC-inspired criteria [52]. We review related work in Section 6. These recent advances provide evidence that, while traditional software testing techniques are not directly applicable to testing DNNs, the sophisticated concepts and principles behind these techniques, if adapted appropriately, could be useful to the machine learning domain. Nevertheless, none of the proposed techniques uses *fault localization* [4,47,63], which can identify parts of a system that are most responsible for incorrect behaviour.

In this paper, we introduce *DeepFault*, the first fault localization-based whitebox testing approach for DNNs. The objectives of DeepFault are twofold: (i) *identification* of *suspicious* neurons, i.e., neurons likely to be more responsible for incorrect DNN behaviour; and (ii) *synthesis* of new inputs, using correctly classified inputs, that exercise the identified suspicious neurons. Similar to conventional fault localization, which receives as input a faulty software and outputs a ranked list of suspicious code locations where the software may be defective [63], DeepFault *analyzes* the behaviour of neurons of a DNN after training to establish their hit spectrum and *identifies* suspicious neurons by employing suspiciousness measures. DeepFault employs a suspiciousness-guided algorithm to *synthesize* new inputs, that achieve high activation values for suspicious neurons, by modifying correctly classified inputs. Our empirical evaluation on the popular publicly available datasets MNIST [32] and CIFAR-10 [1] provides evidence that DeepFault can identify neurons which can be held responsible for insufficient network performance. DeepFault can also synthesize new inputs, which closely resemble the original inputs, are highly adversarial and increase the activation values of the identified suspicious neurons. To the best of our knowledge, Deep-Fault is the first research attempt that introduces *fault localization* for DNNs to identify suspicious neurons and synthesize new, likely adversarial, inputs.

Overall, the main contributions of this paper are:


The reminder of the paper is structured as follows. Section 2 presents briefly DNNs and fault localization in traditional software testing. Section 3 introduces *DeepFault* and Section 4 presents its open-source implementation. Section 5 describes the experimental setup, research questions and evaluation carried out. Sections 6 and 7 discuss related work and conclude the paper, respectively.

## **2 Background**

#### **2.1 Deep Neural Networks**

We consider Deep Learning software systems in which one or more system modules is controlled by DNNs [13]. A typical feed-forward DNN comprises multiple interconnected neurons organised into several layers: the *input* layer, the *output* layer and at least one *hidden* layer (Fig. 1). Each DNN layer comprises a sequence of neurons. A *neuron* denotes a computing unit that applies a *nonlinear activation function* to its inputs and transmits the result to neurons in the successive layer. Commonly used

**Fig. 1.** A four layer fully-connected DNN that receives inputs from vehicle sensors (camera, LiDAR, infrared) and outputs a decision for speed, steering angle and brake.

activation functions are sigmoid, hyperbolic tangent, ReLU (Rectified Linear Unit) and leaky ReLU [13]. Except from the input layer, every neuron is connected to neurons in the successive layer with *weights*, i.e., edges, whose values signify the strength of a connection between neuron pairs. Once the DNN architecture is defined, i.e., the number of layers, neurons per layer and activation functions, the DNN undergoes a *training process* using a large amount of labelled training data to find weight values that minimise a *cost function*.

In general, a DNN could be considered as a parametric multidimensional function that consumes input data (e.g, raw image pixels) in its input layer, extracts *features*, i.e., semantic concepts, by performing a series of nonlinear transformations in its *hidden layers*, and, finally, produces a decision that matches the effect of these computations in its *output layer*.

## **2.2 Software Fault Localization**

Fault localization (FL) is a white box testing technique that focuses on identifying source code elements (e.g., statements, declarations) that are more likely to contain faults. The general FL process [63] for traditional software uses as inputs a program *P*, corresponding to the system under test, and a test suite *T*, and employs an FL technique to test *P* against *T* and establish subsets that represent the passed and failed tests. Using these sets and information regarding program elements <sup>p</sup> <sup>∈</sup> <sup>P</sup>, the FL technique extracts fault localization data which is then employed by an FL measure to establish the "suspiciousness" of each program element p. Spectrum-based FL, the most studied class of FL techniques, uses program traces (called program spectra) of successful and failed test executions to establish for program element p the tuple (e*s*, e*<sup>f</sup>* , n*s*, n*<sup>f</sup>* ). Members e*<sup>s</sup>* and e*<sup>f</sup>* (n*<sup>s</sup>* and n*<sup>f</sup>* ) represent the number of times the corresponding program element has been (has not been) executed by tests, with success and fail, respectively. A spectrum-based FL measure consumes this list of tuples and ranks the program elements in decreasing order of suspiciousness enabling software engineers to inspect program elements and find faults effectively. For a comprehensive survey of state-of-the-art FL techniques, see [63].

## **3 DeepFault**

In this section, we introduce our DeepFault whitebox approach that enables to systematically test DNNs by identifying and localizing highly erroneous neurons across a DNN. Given a pre-trained DNN, DeepFault, whose workflow is shown in Fig. 2, performs a series of *analysis*, *identification* and *synthesis* steps to identify highly erroneous DNN neurons and synthesize new inputs that exercise erroneous neurons. We describe the DeepFault steps in Sections 3.1, 3.2 and 3.3.

We use the following notations to describe DeepFault. Let N be a DNN with <sup>l</sup> layers. Each layer <sup>L</sup>*i*, <sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>l</sup>, consists of <sup>s</sup>*<sup>i</sup>* neurons and the total number of neurons in <sup>N</sup> is given by <sup>s</sup> <sup>=</sup> *l <sup>i</sup>*=1 <sup>s</sup>*i*. Let also <sup>n</sup>*i,j* be the <sup>j</sup>-th neuron in the <sup>i</sup>-th layer. When the context is clear, we use <sup>n</sup> ∈ N to denote any neuron which is part of the DNN <sup>N</sup> irrespective of its layer. Likewise, we use <sup>N</sup>*<sup>H</sup>* to denote the neurons which belong to the hidden layers of N, i.e., <sup>N</sup>*<sup>H</sup>* <sup>=</sup> {n*ij* <sup>|</sup><sup>1</sup> < i < l, <sup>1</sup> <sup>≤</sup> <sup>j</sup> <sup>≤</sup> <sup>s</sup>*j*}. We use <sup>T</sup> to denote the set of test inputs from the input domain of <sup>N</sup> , <sup>t</sup> ∈ T to denote a concrete input, and <sup>u</sup> <sup>∈</sup> <sup>t</sup> for an element of <sup>t</sup>. Finally, we use the function <sup>φ</sup>(t, n) to signify the output of the activation function of neuron <sup>n</sup> ∈ N .

#### **3.1 Neuron Spectrum Analysis**

The first step of DeepFault involves the analysis of neurons within a DNN to establish suitable neuron-based attributes that will drive the detection and localization of faulty neurons. As highlighted in recent research [18,48], the adoption of whitebox testing techniques provides additional useful insights regarding internal neuron activity and network behaviour. These insights cannot be easily extracted through black-box DNN testing, i.e., assessing the performance of a DNN considering only the decisions made given a set of test inputs T .

**Fig. 2.** DeepFault workflow.

DeepFault initiates the identification of suspicious neurons by establishing attributes that capture a neuron's execution pattern. These attributes are defined as follows. Attributes attras *<sup>n</sup>* and attraf *<sup>n</sup>* signify the number of times neuron n was active (i.e., the result of the activation function φ(t, n) was above the predefined threshold) and the network made a successful or failed decision, respectively. Similarly, attributes attrns *<sup>n</sup>* and attrnf *<sup>n</sup>* cover the case in which neuron n is not active. DeepFault analyses the behaviour of neurons in the DNN hidden layers, under a specific test set T , to assemble a *Hit Spectrum (HS)* for each neuron, i.e., a tuple describing its dynamic behaviour. We define formally the HS as follows.

**Definition 1.** Given a DNN <sup>N</sup> and a test set <sup>T</sup> , we say that for any neuron <sup>n</sup> <sup>∈</sup> <sup>N</sup>*<sup>H</sup>* its hit spectrum is given by the tuple HS*<sup>n</sup>* = (attras *<sup>n</sup>* , attraf *<sup>n</sup>* , attrns *<sup>n</sup>* , attrnf *<sup>n</sup>* ).

Note that the sum of each neuron's HS should be equal to the size of T .

Clearly, the interpretation of a hit spectrum (cf. Definition 1) is meaningful only for neurons in the hidden layers of a DNN. Since neurons within the input layer L<sup>1</sup> correspond to elements from the input domain (e.g., pixels from an image captured by a camera in Fig. 1), we consider them to be "correct-byconstruction". Hence, these neurons cannot be credited or held responsible for a successful or failed decision made by the network. Furthermore, input neurons are always active and thus propagate one way or another their values to neurons in the following layer. Likewise, neurons within the output layer L*<sup>l</sup>* simply aggregate values from neurons in the penultimate layer <sup>L</sup>*l*−<sup>1</sup>, multiplied by the corresponding weights, and thus have limited influence in the overall network behaviour and, accordingly, to decision making.

### **3.2 Suspicious Neurons Identification**

During this step, DeepFault consumes the set of hit spectrums, derived from DNN analysis, and identifies *suspicious* neurons which are likely to have made significant contributions in achieving inadequate DNN performance (low accuracy/high loss). To achieve this identification, DeepFault employs a spectrumbased suspiciousness measure which computes a suspiciousness score per neuron using spectrum-related information. Neurons with the highest suspiciousness score are more likely to have been trained unsatisfactorily and, hence, contributing more to incorrect DNN decisions. This indicates that the weights of these neurons need further calibration [13]. We define neuron suspiciousness as follows.


**Table 1.** Suspiciousness measures used in DeepFault




**Definition 2.** Given a neuron <sup>n</sup> ∈ N*<sup>H</sup>* with *HS<sup>n</sup>* being its hit spectrum, a neuron's spectrum-based suspiciousness is given by the function Susp*<sup>n</sup>* : HS*<sup>n</sup>* <sup>→</sup> <sup>R</sup>.

Intuitively, a suspiciousness measure facilitates the derivation of correlations between a neuron's behaviour given a test set T and the failure pattern of T as determined by the overall network behaviour. Neurons whose behaviour pattern is *close* to the failure pattern of T are more likely to operate unreliably, and consequently, they should be assigned higher suspiciousness. Likewise, neurons whose behaviour pattern is *dissimilar* to the failure pattern of T are considered more trustworthy and their suspiciousness values should be low.

In this paper, we instantiate DeepFault with three different suspiciousness measures, i.e., Tarantula [23], Ochiai [42] and D\* [62] whose algebraic formulae are shown in Table 1. The general principle underlying these suspiciousness measures is that the more often a neuron is activated by test inputs for which the DNN made an incorrect decision, and the less often the neuron is activated by test inputs for which the DNN made a correct decision, the more suspicious the neuron is. These suspiciousness measures have been adapted from the domain of fault localization in software engineering [63] in which they have achieved competitive results in automated software debugging by isolating the root causes of software failures while reducing human input. To the best of our knowledge, DeepFault is the first approach that proposes to incorporate these suspiciousness measures into the DNN domain for the identification of defective neurons.

The use of suspiciousness measures in DNNs targets the identification of a set of defective neurons rather than diagnosing an isolated defective neuron. Since the output of a DNN decision task is typically based on the aggregated effects of its neurons (computation units), with each neuron making its own contribution


to the whole computation procedure [13], identifying a single point of failure (i.e., a single defective neuron) has limited value. Thus, after establishing the suspiciousness of neurons in the hidden layers of a DNN, the neurons are ordered in decreasing order of suspiciousness and the k, <sup>1</sup> <sup>≤</sup> <sup>l</sup> <sup>≤</sup> <sup>s</sup>, most probably defective (i.e., "undertrained") neurons are selected. Algorithm 1 presents the high-level steps for identifying and selecting the k most suspicious neurons. When multiple neurons achieve the same suspiciousness score, DeepFault resolves ties by prioritising neurons that belong to deeper hidden layers (i.e., they are closer to the output layer). The rationale for this decision lies in fact that neurons in deeper layers are able to learn more meaningful representations of the input space [69].

#### **3.3 Suspiciousness-Guided Input Synthesis**

DeepFault uses the selected k most suspicious neurons (cf. Section 3.2) to synthesize inputs that exercise these neurons and could be adversarial (see Section 5). The premise underlying the synthesis is that increasing the activation values of suspicious neurons will cause the propagation of degenerate information, computed by these neurons, across the network, thus, shifting the decision boundaries in the output layer. To achieve this, DeepFault applies targeted modification of test inputs from the test set T for which the DNN made correct decisions (e.g., for a classification task, the DNN determined correctly their ground truth classes) aiming to steer the DNN decision to a different region (see Fig. 2).

Algorithm 2 shows the high-level process for synthesising new inputs based on the identified suspicious neurons. The synthesis task is underpinned by a gradient ascent algorithm that aims at determining the extent to which a correctly classified input should be modified to increase the activation values of suspicious neurons. For any test input <sup>t</sup> <sup>∈</sup> <sup>T</sup>*<sup>s</sup>* correctly classified by the DNN, we extract the value of each suspicious neuron and its gradient in lines 6 and 7, respectively. Then, by iterating over each input dimension <sup>u</sup> <sup>∈</sup> <sup>t</sup>, we determine the gradient value u*gradient* by which u will be perturbed (lines 11–12). The value of u*gradient* is based on the mean gradient of u across the suspicious neurons controlled by the function GradientConstraints. This function uses a test set specific step parameter and a distance d parameter to facilitate the synthesis of realistic test inputs that are sufficiently *close*, according to <sup>L</sup>∞-norm, to the original inputs. We demonstrate later in the evaluation of DeepFault (cf. Table 4) that these parameters enable the synthesis of inputs similar to the original. The function DomainConstraints applies domain-specific constraints thus ensuring that u changes due to gradient ascent result in realistic and physically reproducible test inputs as in [48]. For instance, a domain-specific constraint for an image classification dataset involves bounding the pixel values of synthesized images to be within a certain range (e.g., 0–1 for the MNIST dataset [32]). Finally, we append the updated u to construct a new test input t (line 13).

As we experimentally show in Section 5, the suspiciousness measures used by DeepFault can synthesize adversarial inputs that cause the DNN to misclassify previously correctly classified inputs. Thus, the identified suspicious neurons can be attributed a degree of responsibility for the inadequate network performance meaning that their weights have not been optimised. This reduces the DNN's ability for high generalisability and correct operation in untrained data.

## **4 Implementation**

To ease the evaluation and adoption of the DeepFault approach (cf. Fig. 2), we have implemented a prototype tool on top of the open-source machine learning framework Keras (v2.2.2) [9] with Tensorflow (v1.10.1) backend [2]. The full experimental results summarised in the following section are available on DeepFault project page at https://DeepFault.github.io.

## **5 Evaluation**

#### **5.1 Experimental Setup**

We evaluate DeepFault on two popular publicly available datasets. MNIST [32] is a handwritten digit dataset with 60,000 training samples and 10,000 testing samples; each input is a 28 × 28 pixel image with a class label from 0 to 9. CIFAR-10 [1] is an image dataset with 50,000 training samples and 10,000 testing samples; each input is a 32 × 32 image in ten different classes (e.g., dog, bird, car).

For each dataset, we study three DNNs that have been used in previous research [1,60] (Table 2). All DNNs have different architecture and number of trainable parameters. For MNIST, we use fully connected neural networks (dense) and for CIFAR-10 we use convolutional neural networks with maxpooling and dropout layers that have been trained to achieve at least 95% and 70% accuracy on the provided test sets, respectively. The column 'Architecture' shows the number of fully connected hidden layers and the number of neurons per layer. Each DNN uses a leaky ReLU [38] as its activation function (α = 0.01), which has been shown to achieve competitive accuracy results [67].

We instantiate DeepFault using the suspiciousness measures Tarantula [23], Ochiai [42] and D\* [62] (Table 1). We analyse the effectiveness of DeepFault instances using different number of suspicious neurons, i.e., <sup>k</sup> ∈ {1, <sup>2</sup>, <sup>3</sup>, <sup>5</sup>, <sup>10</sup>} and <sup>k</sup> ∈ {10, <sup>20</sup>, <sup>30</sup>, <sup>40</sup>, <sup>50</sup>} for MNIST and CIFAR models, respectively. We also ran preliminary experiments for each model from Table 2 to tune the hyperparameters of Algorithm 2 and facilitate replication of our findings. Since gradient values are model and input specific, the perturbation magnitude should reflect these values and reinforce their impact. We determined empirically that step = 1 and step = 10 are good values, for MNIST and CIFAR models, respectively, that enable our algorithm to perturb inputs. We also set the maximum allowed distance <sup>d</sup> to be at most 10% (L∞) with regards to the range of each input dimension (maximum pixel value). As shown in Table 4, the synthesized inputs are very similar to the original inputs and are rarely constrained by d. Studying other step and d values is part of our future work. All experiments were run on an Ubuntu server with 16 GB memory and Intel Xeon E5-2698 2.20 GHz.


**Table 2.** Details of MNIST and CIFAR-10 DNNs used in the evaluation.

## **5.2 Research Questions**

Our experimental evaluation aims to answer the following research questions.


## **5.3 Results and Discussion**

**RQ1 (Validation).** We apply the DeepFault workflow to the DNNs from Table 2. To this end, we instantiate DeepFault with a suspiciousness measure, *analyse* a pre-trained DNN given the dataset's test set <sup>T</sup> , *identify* <sup>k</sup> neurons with the highest suspiciousness scores and *synthesize* new inputs, from *correctly classified* inputs, that exercise these suspicious neurons. Then, we measure the prediction performance of the DNN on the synthesized inputs using the standard performance metrics: cross-entropy *loss*, i.e., the divergence between output and target distribution, and *accuracy*, i.e., the percentage of correctly classified inputs over all given inputs. Note that DNN analysis is done per class, since the activation pattern of inputs from the same class is similar to each other [69].

Table 3 shows the average loss and accuracy for inputs synthesized by Deep-Fault instances using Tarantula (T), Ochiai (O), D<sup>∗</sup> (D) and a random selection strategy (R) for different number of suspicious neurons k on the MNIST (top) and CIFAR-10 (bottom) models from Table 2. Each cell value in Table 3, except from random R, is averaged over 100 synthesized inputs (10 per class). For R, we collected 500 synthesized inputs (50 per class) over five independent runs, thus, reducing the risk that our findings may have been obtained by chance.

As expected (see Table 3), DeepFault using any suspiciousness measure (T, O, D) obtained considerably lower prediction performance than R on MNIST models. The suspiciousness measures T and O are also effective on CIFAR-10 model, whereas the performance between D and R is similar. These results show that the identified k neurons are actually *suspicious* and, hence, their weights are insufficiently trained. Also, we have sufficient evidence that increasing the activation value of suspicious neurons by slightly perturbing inputs that have been classified correctly by the DNN could transform them into adversarial.

We applied the non-parametric statistical test Mann-Whitney with 95% confidence level [61] to check for statistically significant performance difference between the various DeepFault instances and random. We confirmed the significant difference among T-R and O-R (p-value < 0.05) for all MNIST and CIFAR-10 models and for all k values. We also confirmed the interesting observation that significant difference between D-R exists only for MNIST models (all k values). We plan to investigate this observation further in our future work.

Another interesting observation from Table 3 is the small performance difference of DeepFault instances for different k values. We investigated this further by analyzing the activation values of the next k most suspicious neurons according to the suspiciousness order given by Algorithm 1. For instance, if k = 2 we analysed the activation values of the next <sup>k</sup> ∈ {3, , <sup>5</sup>, <sup>10</sup>} most suspicious neurons. We observed that the synthesized inputs frequently increase the activation values of the k neurons whose suspiciousness scores are also high, in addition to increasing the values of the top k suspicious neurons.

Considering these results, we have empirical evidence about the existence of *suspicious* neurons which can be responsible for inadequate DNN performance. Also, we confirmed that DeepFault instances using sophisticated suspiciousness measures significantly outperform a random strategy for most of the studied DNN models (except from the D-R case on CIFAR models; see RQ3).

**RQ2 (Comparison).** We compare DeepFault instances using different suspiciousness measures and carried out pairwise comparisons using the Mann-Whitney test to check for significant difference between T, O, and D∗. We show the results of these comparisons on the project's webpage. Ochiai achieves better results on MNIST 1 and MNIST 3 models for various k values. This result suggests that the suspicious neurons reported by Ochiai are more responsible


**Table 3.** Accuracy and loss of inputs synthesized by DeepFault on MNIST (top) and CIFAR-10 (bottom) datasets. The best results per suspiciousness measure are shown in bold. (k:#suspicious neurons, T:Tarantula, O:Ochiai, D:D\*, R:Random)

for insufficient DNN performance. D<sup>∗</sup> performs competitively on MNIST 1 and MNIST 3 for <sup>k</sup> ∈ {3, <sup>5</sup>, <sup>10</sup>}, but its performance on CIFAR-10 models is significantly inferior to Tarantula and Ochiai. The best performing suspiciousness measure in CIFAR models for most k values is, by a great amount, Tarantula.

These findings show that multiple suspiciousness measures could be used for instantiating DeepFault with competitive performance. We also have evidence that DeepFault using D<sup>∗</sup> is ineffective for some complex networks (e.g., CIFAR-10), but there is insufficient evidence for the best performing DeepFault instance. Our findings conform to the latest research on software fault localization which claims that there is no single best spectrum-based suspiciousness measure [47].

**RQ3 (Suspiciousness Distribution).** We analysed the distribution of suspicious neurons identified by DeepFault instances across the hidden DNN layers.

**Fig. 3.** Suspicious neurons distribution on MNIST 3 (left) and CIFAR 3 (right) models.

Figure 3 shows the distribution of suspicious neurons on MNIST 3 and CIFAR 3 models with k = 10 and k = 50, respectively. Considering MNIST 3, the majority of suspicious neurons are located at the deeper hidden layers (Dense 4-Dense 8) irrespective of the suspiciousness measure used by DeepFault. This observation holds for the other MNIST models and k values. On CIFAR 3, however, we can clearly see variation in the distributions across the suspiciousness measures. In fact, D<sup>∗</sup> suggests that most of the suspicious neurons belong to initial hidden layers which is in contrast with Tarantula's recommendations. As reported in RQ2, the inputs synthesized by DeepFault using Tarantula achieved the best results on CIFAR models, thus showing that the identified neurons are actually suspicious. This difference in the distribution of suspicious neurons explains the inferior inputs synthesized by D<sup>∗</sup> on CIFAR models (Table 3).

Another interesting finding concerns the relation between the suspicious neurons distribution and the "adversarialness" of synthesized inputs. When suspicious neurons belong to deeper hidden layers, the likelihood of the synthesized input being adversarial increases (cf. Table 3 and Fig. 3). This finding is explained by the fact that initial hidden layers transform input features (e.g., pixel values) into abstract features, while deeper hidden layers extract more semantically meaningful features and, thus, have higher influence in the final decision [13].

**RQ4 (Similarity).** We examined the distance between original, correctly classified, inputs and those synthesized by DeepFault, to establish DeepFault's ability to synthesize realistic inputs. Table 4 (left) shows the distance between original and synthesized inputs for various distance metrics (L<sup>1</sup> Manhattan, L<sup>2</sup> Euclidean, <sup>L</sup><sup>∞</sup> Chebyshev) for different <sup>k</sup> values (# suspicious neurons). The distance values, averaged over inputs synthesized using the DeepFault suspiciousness measures (T, O and D∗), demonstrate that the degree of perturbation is similar irrespective of k for MNIST models, whereas for CIFAR models the distance decreases as k increases. Given that a MNIST input consists of 784 pixels, with each pixel taking values in [0, 1], the average perturbation per input is less than 5.28% of the total possible perturbation (L<sup>1</sup> distance). Similarly, for a CIFAR input that comprises 3072 pixels, with each pixel taking values in {0, <sup>1</sup>, ..., <sup>255</sup>}, the average perturbation per input is less that 0.03% of the total possible perturbation (L<sup>1</sup> distance). Thus, for both datasets, the difference of synthesized inputs to their original versions is very small. We qualitatively

**Table 4.** Distance between synthesized and original inputs. The values shown represent minimal perturbation to the original inputs (< 5% for MNIST and < 1% for CIFAR-10).


**Fig. 4.** Synthesized images (top) and their originals (bottom). For each dataset, suspicious neurons are found using (from left to right) Tarantula, Ochiai, D<sup>∗</sup> and Random.

support our findings by showing in Fig. 4 the synthesized images and their originals for an example set of inputs from the MNIST and CIFAR-10 datasets.

We also compare the distances between original and synthesized inputs based on the suspiciousness measures (Table 4 right). The inputs synthesized by Deep-Fault instances using T, O or D<sup>∗</sup> are very close to the inputs of the random selection strategy (L<sup>1</sup> distance). Considering these results, we can conclude that DeepFault is effective in synthesizing highly adversarial inputs (cf. Table 3) that closely resemble their original counterparts.

**RQ5 (Increasing Activations).** We studied the activation values of suspicious neurons identified by DeepFault to examine whether the synthesized inputs increase the values of these neurons. The gradients of suspicious neurons used in our suspiciousness-guided

**Table 5.** Effectiveness of *suspiciousness-guided input synthesis* algorithm to increase activations values of suspicious neurons.


input synthesis algorithm might be conflicting and a global increase in all suspicious neurons' values might not be feasible. This can occur if some neurons' gradients are negative, indicating a decrease in an input feature's value, whereas other gradients are positive and require to increase the value of the same feature. Table 5 shows the percentage of suspicious neurons k, averaged over all suspiciousness measures for all considered MNIST and CIFAR-10 models from Table 2, whose values were increased by the inputs synthesized by DeepFault. For MNIST models, DeepFault synthesized inputs that increase the suspicious neurons' values with success at least 97% for <sup>k</sup> ∈ {1, <sup>2</sup>, <sup>3</sup>, <sup>5</sup>}, while the average effectiveness for CIFAR models is 90%. These results show the effectiveness of our suspiciousness-guided input synthesis algorithm in generating inputs that increase the activation values of suspicious neurons (see https://DeepFault. github.io).

**RQ6 (Performance).** We measured the performance of Algorithm 2 to synthesize new inputs (https://DeepFault.github.io). The average time required to synthesize a single input for MNIST and CIFAR models is 1 s and 24.3 s, respectively. The performance of the algorithm depends on the number of suspicious neurons (k), the distribution of those neurons over the DNN and its architecture. For CIFAR models, for instance, the execution time per input ranges between 3s (k = 10) and 48 s (k = 50). We also confirmed empirically that more time is taken to synthesize an input if the suspicious neurons are in deeper hidden layers.

#### **5.4 Threats to Validity**

**Construct validity** threats might be due to the adopted experimental methodology including the selected datasets and DNN models. To mitigate this threat, we used widely studied public datasets (MNIST [32] and CIFAR-10 [1]), and applied DeepFault to multiple DNN models of different architectures with competitive prediction accuracies (cf. Table 2). Also, we mitigate threats related to the identification of suspicious neurons (Algorithm 1) by adapting suspiciousness measures from the fault localization domain in software engineering [63].

**Internal validity** threats might occur when establishing the ability of Deep-Fault to synthesize new inputs that exercise the identified suspicious neurons. To mitigate this threat, we used various distance metrics to confirm that the synthesized inputs are close to the original inputs and similar to the inputs synthesized by a random strategy. Another threat could be that the suspiciousness measures employed by DeepFault accidentally outperform the random strategy. To mitigate this threat, we reported the results of the random strategy over five independent runs per experiment. Also, we ensured that the distribution of the randomly selected suspicious neurons resembles the distribution of neurons identified by DeepFault suspiciousness measures. We also used the non-parametric statistical test Mann-Whitney to check for significant difference in the performance of DeepFault instances and random with a 95% confidence level.

**External validity** threats might exist if DeepFault cannot access the internal DNN structure to assemble the hit spectrums of neurons and establish their suspiciousness. We limit this threat by developing DeepFault using the open-source frameworks Keras and Tensorflow which enable whitebox DNN analysis. We also examined various spectrum-based suspiciousness measures, but other measures can be investigated [63]. We further reduce the risk that DeepFault might be difficult to use in practice by validating it against several DNN instances trained on two widely-used datasets. However, more experiments are needed to assess the applicability of DeepFault in domains and networks with characteristics different from those used in our evaluation (e.g., LSTM and Capsule networks [50]).

## **6 Related Work**

**DNN Testing and Verification.** The inability of blackbox DNN testing to provide insights about the internal neuron activity and enable identification of corner-case inputs that expose unexpected network behaviour [14], urged researchers to leverage whitebox testing techniques from software engineering [28,35,43,48,54]. DeepXplore [48] uses a differential algorithm to generate inputs that increase neuron coverage. DeepGauge [35] introduces multigranularity coverage criteria for effective test synthesis. Other research proposes testing criteria and techniques inspired by metamorphic testing [58], combinatorial testing [37], mutation testing [36], MC/DC [54], symbolic execution [15] and concolic testing [55].

Formal DNN verification aims at providing guarantees for trustworthy DNN operation [20]. Abstraction refinement is used in [49] to verify safety properties of small neural networks with sigmoid activation functions, while AI<sup>2</sup> [12] employs abstract interpretation to verify similar properties. Reluplex [26] is an SMTbased approach that verifies safety and robustness of DNNs with ReLUs, and DeepSafe [16] uses Reluplex to identify safe regions in the input space. DLV [60] can verify local DNN robustness given a set of user-defined manipulations.

DeepFault adopts spectrum-based fault localization techniques to systematically identify suspicious neurons and uses these neurons to synthesize new inputs, which is mostly orthogonal to existing research on DNN testing and verification.

**Adversarial Deep Learning.** Recent studies have shown that DNNs are vulnerable to adversarial examples [57] and proposed search algorithms [8,40,41,44], based on gradient descent or optimisation techniques, for generating adversarial inputs that have a minimal difference to their original versions and force the DNN to exhibit erroneous behaviour. These types of adversarial examples have been shown to exist in the physical world too [29]. The identification of and protection against these adversarial attacks, is another active area of research [45,59]. Deep-Fault is similar to these approaches since it uses the identified suspicious neurons to synthesize perturbed inputs which as we have demonstrated in Section 5 are adversarial. Extending DeepFault to support the synthesis of adversarial inputs using these adversarial search algorithms is part of our future work.

**Fault Localization in Traditional Software.** Fault localization is widely studied in many software engineering areas including including software debugging [46], program repair [17] and failure reproduction [21,22]. The research focus in fault localization is the development of identification methods and suspiciousness measures that isolate the root causes of software failures with reduced engineering effort [47]. The most notable fault localization methods are spectrumbased [3,23,30,31,62], slice-based [64] and model-based [39]. Threats to the value of empirical evaluations of spectrum-based fault localization are studied in [53], while the theoretical analyses in [66,68] set a formal foundation about desirable formal properties that suspiciousness measures should have. We refer interested readers to a recent comprehensive survey on fault localization [63].

## **7 Conclusion**

The potential deployment of DNNs in safety-critical applications introduces unacceptable risks. To reduce these risks to acceptable levels, DNNs should be tested thoroughly. We contribute in this effort, by introducing DeepFault, the first fault localization-based whitebox testing approach for DNNs. DeepFault *analyzes* pre-trained DNNs, given a specific test set, to establish the hit spectrum of each neuron, *identifies suspicious neurons* by employing suspiciousness measures and *synthesizes* new inputs that increase the activation values of the suspicious neurons. Our empirical evaluation on the widely-used MNIST and CIFAR-10 datasets shows that DeepFault can identify neurons which can be held responsible for inadequate performance. DeepFault can also synthesize new inputs, which closely resemble the original inputs, are highly adversarial and exercise the identified suspicious neurons. In future work, we plan to evaluate DeepFault on other DNNs and datasets, to improve the suspiciousness-guided synthesis algorithm and to extend the synthesis of adversarial inputs [44]. We will also explore techniques to repair the identified suspicious neurons, thus enabling to reason about the safety of DNNs and support safety case generation [7,27].

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Variability Abstraction and Refinement for Game-Based Lifted Model Checking of Full CTL**

Aleksandar S. Dimovski1(B) , Axel Legay<sup>2</sup>, and Andrzej Wasowski<sup>3</sup>

<sup>1</sup> Mother Teresa University, 12 Udarna Brigada 2a, 1000 Skopje, Macedonia aleksandar.dimovski@unt.edu.mk

<sup>2</sup> UCLouvain, Belgium and IRISA/Inria Rennes, Rennes, France

<sup>3</sup> IT University of Copenhagen, Rued Langgaards Vej 7, 2300 Copenhagen, Denmark

**Abstract.** Variability models allow effective building of many custom model variants for various configurations. Lifted model checking for a variability model is capable of verifying all its variants simultaneously in a single run by exploiting the similarities between the variants. The computational cost of lifted model checking still greatly depends on the number of variants (the size of configuration space), which is often huge. One of the most promising approaches to fighting the configuration space explosion problem in lifted model checking are variability abstractions. In this work, we define a novel game-based approach for variability-specific abstraction and refinement for lifted model checking of the full CTL, interpreted over 3-valued semantics. We propose a direct algorithm for solving a 3-valued (abstract) lifted model checking game. In case the result of model checking an abstract variability model is indefinite, we suggest a new notion of refinement, which eliminates indefinite results. This provides an iterative incremental variability-specific abstraction and refinement framework, where refinement is applied only where indefinite results exist and definite results from previous iterations are reused.

## **1 Introduction**

Software Product Line (SPL) [6] is an efficient method for systematic development of a family of related models, known as *variants* (*valid products*), from a common code base. Each variant is specified in terms of *features* (static configuration options) selected for that particular variant. SPLs are particularly popular in the embedded and critical system domains (e.g. cars, phones, avionics, healthcare).

Lifted model checking [4,5] is a useful approach for verifying properties of variability models (SPLs). Given a variability model and a specification, the lifted model checking algorithm, unlike the standard non-lifted one, returns precise conclusive results for all individual variants, that is, for each variant it reports whether it satisfies or violates the specification. The main disadvantage of lifted model checking is the *configuration space explosion problem*, which refers to the high number of variants in the variability model. In fact, exponentially many variants can be derived from only few configuration options (features). One of the most successful approaches to fighting the configuration space explosion are so-called *variability abstractions* [12,14,15,17]. They hide some of the configuration details, so that many of the concrete configurations become indistinguishable and can be collapsed into a single abstract configuration (variant). This results in smaller abstract variability models with a smaller number of abstract configurations. In order to be conservative w.r.t. the full CTL temporal logic, abstract variability models have two types of transitions: *may-transitions* which represent possible transitions in the concrete model, and *must-transitions* which represent the definite transitions in the concrete model. May and must transitions correspond to over and under approximations, and are needed in order to preserve universal and existential CTL properties, respectively.

Here we consider the 3-valued semantics for interpreting CTL formulae over abstract variability models. This semantics evaluates a formula on an abstract model to either *true*, *false*, or *indefinite*. Abstract variability models are designed to be conservative for both *true* and *false*. However, the *indefinite* answer gives no information on the value of the formula on the concrete model. In this case, a refinement is needed in order to make the abstract models more precise.

The technique proposed here significantly extends the scope of existing automatic variability-specific abstraction refinement procedures [8,18], which currently support the verification of universal LTL properties only. They use conservative variability abstractions to construct over-approximated abstract variability models, which preserve LTL properties. If a spurious counterexample (introduced due to the abstraction) is found in the abstract model, the procedures [8,18] use Craig interpolation to extract relevant information from it in order to define the refinement of abstract models. Variability abstractions that preserve all (universal and existential) CTL properties have been previously introduced [12], but without an automatic mechanism for constructing them and no notion of refinement. The abstractions [12] has to be constructed manually by an engineer before verification. In order to make the entire verification procedure automatic, we need to develop an abstraction and refinement framework for CTL properties.

In this work, we propose the first variability-specific abstraction refinement procedure for automatically verifying arbitrary formulae of CTL. To achieve this aim, model checking *games* [24–26] represent the most suitable framework for defining the refinement. In this way, we establish a brand new connection between games and family-based (SPL) model checking. The refinement is defined by finding the reason for the indefinite result of an algorithm that solves the corresponding model checking game, which is played by two players: Player ∀ (trying to refute the formula <sup>Φ</sup> on an abstract model <sup>M</sup>) and Player <sup>∃</sup> (trying to verify <sup>Φ</sup> on <sup>M</sup>). The game is played on a *game board*, which consists of configurations of the form (s, Φ ) where <sup>s</sup> is a state of the abstract model <sup>M</sup> and <sup>Φ</sup> is a subformula of Φ, such that the value of Φ in s is relevant for determining the final model checking result. The players make moves between configurations in which they try to verify or refute Φ in s. All possible plays of a game are captured in the game-graph, whose nodes are the elements of the game board and whose edges are the possible moves of the players. The model checking game is solved via a coloring algorithm which colors each node (s, Φ ) in the game-graph by T, <sup>F</sup>, or ? iff the value of <sup>Φ</sup> in <sup>s</sup> is *true*, *false*, or indefinite, respectively. Player <sup>∀</sup> has a winning strategy at the node (s, Φ ) iff the node is colored by F iff Φ does not hold in <sup>s</sup>, and Player <sup>∃</sup> has a winning strategy at (s, Φ ) iff the node is colored by T iff Φ holds in s. In addition, it is also possible that neither of players has a winning strategy, in which case the node is colored by ? and the value of Φ in s is indefinite. In this case, we want to refine the abstract model. We can find the reason for the tie by examining the game-graph. We choose a refinement criterion, which splits abstract configurations so that the new, refined abstract configurations represent smaller subsets of concrete configurations.

## **2 Background**

*Variability Models.* Let <sup>F</sup> <sup>=</sup> {A1,...,An} be a finite set of Boolean variables representing the features available in a variability model. A specific subset of features, <sup>k</sup> <sup>⊆</sup> <sup>F</sup>, known as *configuration*, specifies a *variant* (valid product) of a variability model. We assume that only a subset <sup>K</sup> <sup>⊆</sup> <sup>2</sup><sup>F</sup> of configurations are *valid*. An alternative representation of configurations is based upon propositional formulae. Each configuration <sup>k</sup> <sup>∈</sup> <sup>K</sup> can be represented by a formula: <sup>k</sup>(A1) <sup>∧</sup> ...∧k(An), where <sup>k</sup>(Ai) = <sup>A</sup><sup>i</sup> if <sup>A</sup><sup>i</sup> <sup>∈</sup> <sup>k</sup>, and <sup>k</sup>(Ai) = <sup>¬</sup>A<sup>i</sup> if <sup>A</sup><sup>i</sup> <sup>∈</sup>/ <sup>k</sup> for 1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>. We use *transition systems* (TS) to describe behaviors of single-systems.

**Definition 1.** *A transition system (TS) is a tuple* <sup>T</sup> = (S, Act, trans, I, AP, L)*, where* <sup>S</sup> *is a set of states;* Act *is a set of actions;* trans <sup>⊆</sup> <sup>S</sup> <sup>×</sup> Act <sup>×</sup> <sup>S</sup> *is a transition relation which is* total*, so that for each state there is an outgoing transition;* <sup>I</sup> <sup>⊆</sup> <sup>S</sup> *is a set of initial states;* AP *is a set of atomic propositions; and* <sup>L</sup> : <sup>S</sup> <sup>→</sup> <sup>2</sup>AP *is a labelling function specifying which propositions hold in a state. We write* s<sup>1</sup> λ −−→s<sup>2</sup> *whenever* (s1, λ, s2) <sup>∈</sup> *trans.*

An *execution* (behaviour) of a TS <sup>T</sup> is an *infinite* sequence <sup>ρ</sup> <sup>=</sup> <sup>s</sup>0λ1s1λ<sup>2</sup> ... with <sup>s</sup><sup>0</sup> <sup>∈</sup> <sup>I</sup> such that <sup>s</sup><sup>i</sup> λ*i*+1 −→ <sup>s</sup>i+1 for all <sup>i</sup> <sup>≥</sup> 0. The *semantics* of the TS <sup>T</sup> , denoted as [[T ]]T S, is the set of its executions.

A *featured transition system* (FTS) is a particular instance of a variability model, which describes the behavior of a whole family of systems in a single monolithic description, where the transitions are guarded by a *presence condition* that identifies the variants they belong to. The presence conditions ψ are drawn from the set of feature expressions, *FeatExp*(F), which are propositional logic formulae over <sup>F</sup>: <sup>ψ</sup>:: = *true* <sup>|</sup> <sup>A</sup> <sup>∈</sup> <sup>F</sup> | ¬<sup>ψ</sup> <sup>|</sup> <sup>ψ</sup><sup>1</sup> <sup>∧</sup> <sup>ψ</sup>2. We write [[ψ]] to denote the set of configurations from <sup>K</sup> that satisfy <sup>ψ</sup>, i.e. <sup>k</sup> <sup>∈</sup> [[ψ]] iff <sup>k</sup> <sup>|</sup><sup>=</sup> <sup>ψ</sup>.

**Definition 2.** *A featured transition system (FTS) represents a tuple* F = (S, Act, trans, I, AP, L, F, K, δ)*, where* S, Act, trans, I, AP*, and* L *form a TS;* F *is the set of available features;* <sup>K</sup> *is a set of valid configurations; and* <sup>δ</sup> : trans<sup>→</sup> *FeatExp*(F) *is a total function decorating transitions with presence conditions.*

**Fig. 1.** VendMach **Fig. 2.** <sup>π</sup>∅(VendMach) **Fig. 3.** *<sup>α</sup>*join(VendMach)

The *projection* of an FTS <sup>F</sup> to a configuration <sup>k</sup> <sup>∈</sup> <sup>K</sup>, denoted as <sup>π</sup>k(F), is the TS (S, Act, trans , I, AP, L), where trans <sup>=</sup> {<sup>t</sup> <sup>∈</sup> trans <sup>|</sup> <sup>k</sup> <sup>|</sup><sup>=</sup> <sup>δ</sup>(t)}. We lift the definition of *projection* to sets of configurations <sup>K</sup> <sup>⊆</sup>K, denoted as <sup>π</sup><sup>K</sup>- (F), by keeping the transitions admitted by at least one of the configurations in K . That is, π<sup>K</sup>- (F), is the FTS (S, Act, trans , I, AP, L, F, K , δ ), where trans = {<sup>t</sup> <sup>∈</sup> trans | ∃<sup>k</sup> <sup>∈</sup> <sup>K</sup> .k <sup>|</sup><sup>=</sup> <sup>δ</sup>(t)} and <sup>δ</sup> <sup>=</sup> <sup>δ</sup><sup>|</sup> trans is the restriction of δ to trans . The *semantics* of an FTS F, denoted as [[F]]FTS, is the union of behaviours of the projections on all valid variants <sup>k</sup> <sup>∈</sup> <sup>K</sup>, i.e. [[F]]FTS <sup>=</sup> <sup>∪</sup><sup>k</sup>∈<sup>K</sup>[[πk(F)]]T S.

*Modal transition systems* (MTSs) [22] are a generalization of transition systems equipped with two transition relations: *must* and *may*. The former (must) is used to specify the required behavior, while the latter (may) to specify the allowed behavior of a system. We will use MTSs for representing abstractions of FTSs.

**Definition 3.** *A modal transition system (MTS) is represented by a tuple* M = (S, Act, trans*may*, trans*must*, I, AP, L)*, where* trans*may* <sup>⊆</sup> <sup>S</sup> <sup>×</sup> Act <sup>×</sup> <sup>S</sup> *describe may transitions of* <sup>M</sup>*;* trans*must* <sup>⊆</sup> <sup>S</sup> <sup>×</sup>Act×<sup>S</sup> *describe must transitions of* <sup>M</sup>*, such that* trans*may is total and* trans*must* <sup>⊆</sup> trans*may.*

A *may-execution* in M is an execution (infinite sequence) with all its transitions in transmay; whereas a *must-execution* in <sup>M</sup> is a maximal sequence with all its transitions in transmust, which cannot be extended with any other transition from transmust. Note that since transmust is not necessarily total, mustexecutions can be finite. We use [[M]]may MTS (resp., [[M]]must MTS) to denote the set of all may-executions (resp., must-executions) in M starting in an initial state.

*Example 1.* Throughout this paper, we will use a beverage vending machine as a running example [4]. Figure 1 shows the FTS of a VendMach family. It has two features, and each of them is assigned an identifying letter and a color. The features are: CancelPurchase (c, in brown), for canceling a purchase after a coin is entered; and FreeDrinks (f, in blue) for offering free drinks. Each transition is labeled by an *action* followed by a *feature expression*. For instance, the transition s<sup>0</sup> *free*/f −−−→ <sup>s</sup><sup>2</sup> is included in variants where the feature <sup>f</sup> is enabled. For clarity, we omit to write the presence condition *true* in transitions. There is only one atomic proposition served <sup>∈</sup> AP, which is abbreviated as *<sup>r</sup>*. Note that *<sup>r</sup>* <sup>∈</sup> <sup>L</sup>(s2), whereas *<sup>r</sup>* <sup>∈</sup> <sup>L</sup>(s0) and *<sup>r</sup>* <sup>∈</sup> <sup>L</sup>(s1).

By combining various features, a number of variants of this VendMach can be obtained. The set of valid configurations is: <sup>K</sup>VM <sup>=</sup> {∅, {c}, {f}, {c, <sup>f</sup>}} (or, equivalently <sup>K</sup>VM <sup>=</sup>{¬c∧¬f, <sup>c</sup>∧¬f,¬c∧f, <sup>c</sup>∧f}). Figure <sup>2</sup> shows a basic version of VendMach that only serves a drink, described by the configuration: <sup>∅</sup> (or, as formula <sup>¬</sup><sup>c</sup> ∧¬f). It takes a coin, serves a drink, opens a compartment so the customer can take the drink. Figure 3 shows an MTS, where must transitions are denoted by solid lines, while may transitions by dashed lines.

*CTL Properties.* We present Computation Tree Logic (CTL) [1] for specifying system properties. CTL state formulae Φ are given by:

$$\Phi ::= true \mid false \mid l \mid \Phi\_1 \land \Phi\_2 \mid \Phi\_1 \lor \Phi\_2 \mid A\phi \mid E\phi, \qquad \phi ::= \bigcirc \Phi \mid \Phi\_1 \mathsf{U} \Phi\_2 \mid \Phi\_1 \mathsf{V} \Phi\_2.$$

where <sup>l</sup> <sup>∈</sup> Lit <sup>=</sup> *AP* ∪ {¬<sup>a</sup> <sup>|</sup> <sup>a</sup> <sup>∈</sup> *AP*} and <sup>φ</sup> represent CTL path formulae. Note that the CTL state formulae <sup>Φ</sup> are given in negation normal form (<sup>¬</sup> is applied only to atomic propositions). The path formula <sup>Φ</sup> can be read as "in the next state Φ", Φ1UΦ<sup>2</sup> can be read as "Φ<sup>1</sup> until Φ2", and its dual Φ1VΦ<sup>2</sup> can be read as "Φ<sup>2</sup> while not Φ1" (where Φ<sup>1</sup> may never hold).

We assume the standard CTL semantics over TSs is given [1] (see also [16, Appendix A]). We write [T |<sup>=</sup> <sup>Φ</sup>] = *tt* to denote that <sup>T</sup> satisfies the formula <sup>Φ</sup>, whereas [T |<sup>=</sup> <sup>Φ</sup>] = *ff* to denote that <sup>T</sup> does not satisfy <sup>Φ</sup>.

We say that an FTS <sup>F</sup> satisfies a CTL formula <sup>Φ</sup>, written [F |<sup>=</sup> <sup>Φ</sup>] = *tt*, iff all its valid variants satisfy the formula, i.e. <sup>∀</sup>k∈K. [πk(F) <sup>|</sup><sup>=</sup> <sup>Φ</sup>] = *tt*. Otherwise, we say <sup>F</sup> does not satisfy <sup>Φ</sup>, written [F |<sup>=</sup> <sup>Φ</sup>] = *ff*. In this case, we also want to determine a non-empty set of violating variants <sup>K</sup> <sup>⊆</sup> <sup>K</sup>, such that <sup>∀</sup>k <sup>∈</sup> K . [π<sup>k</sup>- (F) <sup>|</sup><sup>=</sup> <sup>Φ</sup>] = *ff* and <sup>∀</sup>k∈K\K . [πk(F) <sup>|</sup><sup>=</sup> <sup>Φ</sup>] = *tt*.

We define the 3-valued semantics of CTL over an MTS M slightly differently from the semantics for TSs. A CTL state formula Φ is satisfied in a state s of an MTS <sup>M</sup>, denoted [M, s <sup>|</sup>=<sup>3</sup> <sup>Φ</sup>], iff (<sup>M</sup> is omitted when clear from context):<sup>1</sup>

$$\begin{aligned} (1) \ \ [s \ \mid \, \mid s \ \mid \, ^3a \ ] &= \begin{cases} tt, & \text{if } a \in L(s) \\ \varnothing, & \text{if } a \notin L(s) \end{cases}, \quad [s \ \mid \, ^3\neg a] = \begin{cases} tt, & \text{if } a \notin L(s) \\ \varnothing, & \text{if } a \in L(s) \end{cases} \\ (2) \ \ [s \ \mid \, ^3\Phi\_1 \land \Phi\_2] &= \begin{cases} tt, & \text{if } \, [s \ \mid \, ^3\Phi\_1 \end{cases} = tt \text{ and } [s \ \mid \, ^3\Phi\_2] = tt \\ \varnothing, & \text{if } [s \ \mid \, ^3\Phi\_1 \end{cases} = \emptyset \text{ or } [s \ \mid \, ^3\Phi\_2] = \emptyset \\ (3) \ \ [s \ \mid \, ^3A\phi] &= \begin{cases} tt, & \text{if } \forall \rho \in [\,\, \mathcal{M}]\_{MTS}^{\text{max},s} \cdot [\rho \ \mid \, ^3\phi] = tt \\ \varnothing, & \text{if } \exists \rho \in [\,\mathcal{M}]\_{MTS}^{\text{max},s} \cdot [\rho \ \mid \, ^3\phi] = \emptyset \end{cases} \\ [s \ \mid \, ^3E\phi] &= \begin{cases} tt, & \text{if } \exists \rho \in [\,\mathcal{M}]\_{MTS}^{\text{max},s} \cdot [\rho \ \mid \, ^3\phi] = tt \\ \varnothing, & \text{if } \forall \rho \in [\,\mathcal{M}]\_{MTS}^{\text{max},s} \cdot [\rho \ \mid \, ^3\phi] = \emptyset \end{cases} \end{aligned}$$

where [[M]]may,s MTS (resp., [[M]]must,s MTS ) denotes the set of all may-executions (mustexecutions) starting in the state <sup>s</sup> of <sup>M</sup>. Satisfaction of a path formula <sup>φ</sup> for a may- or must-execution <sup>ρ</sup> <sup>=</sup> <sup>s</sup>0λ1s1λ<sup>2</sup> ... of an MTS <sup>M</sup> (we write <sup>ρ</sup><sup>i</sup> <sup>=</sup> <sup>s</sup><sup>i</sup> to

<sup>1</sup> See [16, Appendix A] for definitions of [<sup>s</sup> <sup>|</sup>=<sup>3</sup> <sup>Φ</sup>1∨Φ2], [<sup>ρ</sup> <sup>|</sup>=<sup>3</sup> Φ], and [<sup>ρ</sup> <sup>|</sup>=<sup>3</sup> (Φ1VΦ2)].

denote the <sup>i</sup>-th state of <sup>ρ</sup>, and <sup>|</sup>ρ<sup>|</sup> to denote the number of states in <sup>ρ</sup>), denoted [M, ρ <sup>|</sup>=<sup>3</sup> <sup>φ</sup>], is defined as (<sup>M</sup> is omitted when clear from context):

(4) [<sup>ρ</sup> <sup>|</sup>=<sup>3</sup> (Φ1UΦ2)]= ⎧ ⎪⎪⎪⎨ ⎪⎪⎪⎩ tt, if ∃0≤i≤|ρ|. [ρ<sup>i</sup> <sup>|</sup>=<sup>3</sup> <sup>Φ</sup>2]=tt <sup>∧</sup> (∀j < i.[ρ<sup>j</sup> <sup>|</sup>=<sup>3</sup> <sup>Φ</sup>1]=tt) ff, if <sup>∀</sup>0≤i≤|ρ|. <sup>∀</sup>j < i.[ρ<sup>j</sup> <sup>|</sup>=<sup>3</sup>Φ1]=ff <sup>=</sup><sup>⇒</sup> [ρ<sup>i</sup> <sup>|</sup>=<sup>3</sup> <sup>Φ</sup>2]=ff ∧ ∀i≥0.[ρ<sup>i</sup> <sup>|</sup>=<sup>3</sup>Φ1]=ff <sup>=</sup>⇒ |ρ<sup>|</sup> <sup>=</sup> <sup>∞</sup> ⊥, otherwise

A MTS <sup>M</sup> satisfies a formula <sup>Φ</sup>, written [M |=<sup>3</sup> <sup>Φ</sup>] = *tt*, iff <sup>∀</sup>s<sup>0</sup> <sup>∈</sup> I. [s<sup>0</sup> <sup>|</sup>=<sup>3</sup> <sup>Φ</sup>] = *tt*. We say that [M |=<sup>3</sup> <sup>Φ</sup>] = *ff* if <sup>∃</sup>s<sup>0</sup> <sup>∈</sup> I. [s<sup>0</sup> <sup>|</sup>=<sup>3</sup> <sup>Φ</sup>] = *ff*. Otherwise, [M |=<sup>3</sup> <sup>Φ</sup>] = <sup>⊥</sup>.

*Example 2.* Consider the FTS VendMach and MTS *α*join(VendMach) in Figs. <sup>1</sup> and 3. The property <sup>Φ</sup><sup>1</sup> <sup>=</sup> <sup>A</sup>(¬*r*U*r*) states that in the initial state along every execution will eventually reach the state where *r* holds. Note that [VendMach <sup>|</sup><sup>=</sup> <sup>Φ</sup>1] = *ff*. E.g., if the feature <sup>c</sup> is enabled, a counterexample where the state <sup>s</sup><sup>2</sup> that satisfies *<sup>r</sup>* is never reached is: <sup>s</sup><sup>0</sup> <sup>→</sup> <sup>s</sup><sup>1</sup> <sup>→</sup> <sup>s</sup><sup>0</sup> <sup>→</sup> .... The set of violating products is [[c]] = {{c}, {f, <sup>c</sup>}} ⊆ <sup>K</sup>V M.However, [π[[¬c]](VendMach) <sup>|</sup><sup>=</sup> <sup>Φ</sup>1] = *tt*. We also have that [*α*join(VendMach) <sup>|</sup>=<sup>3</sup> <sup>Φ</sup>1] = <sup>⊥</sup>, since (1) there is a may-execution in *<sup>α</sup>*join(VendMach) where <sup>s</sup><sup>2</sup> is never reached: <sup>s</sup><sup>0</sup> <sup>→</sup> <sup>s</sup><sup>1</sup> <sup>→</sup> <sup>s</sup><sup>0</sup> <sup>→</sup> ..., and (2) there is no must-execution that violates <sup>Φ</sup>1.

Consider the property <sup>Φ</sup><sup>2</sup> <sup>=</sup> <sup>E</sup>(¬*r*U*r*), which describes a situation where in the initial state there exists an execution that will eventually reach s<sup>2</sup> that satisfies *<sup>r</sup>*. Note that [VendMach <sup>|</sup><sup>=</sup> <sup>Φ</sup>2] = *tt*, since even for variants with the feature <sup>c</sup> there is a continuation from the state <sup>s</sup><sup>1</sup> to <sup>s</sup>2. But, [*α*join(VendMach) <sup>|</sup><sup>=</sup> <sup>Φ</sup>2] = <sup>⊥</sup> since (1) there is no a must-execution in *<sup>α</sup>*join(VendMach) that reaches <sup>s</sup><sup>2</sup> from <sup>s</sup>0, and (2) there is a may-execution that satisfies <sup>Φ</sup>2.

## **3 Abstraction of FTSs**

We now introduce the variability abstractions [12] which preserve full CTL. We start working with Galois connections<sup>2</sup> between Boolean complete lattices of feature expressions, and then induce a notion of abstraction of FTSs.

The Boolean complete lattice of feature expressions (propositional formulae over <sup>F</sup>) is: (*FeatExp*(F)/≡, <sup>|</sup>=,∨,∧,*true*, *false*,¬). The elements of the domain *FeatExp*(F)/<sup>≡</sup> are equivalence classes of propositional formulae <sup>ψ</sup> <sup>∈</sup> *FeatExp*(F) obtained by quotienting by the semantic equivalence ≡. The ordering |= is the standard entailment between propositional logics formulae, whereas the least upper bound and the greatest lower bound are just logical disjunction and conjunction respectively. Finally, the constant *false* is the least, *true* is the greatest element, and negation is the complement operator.

<sup>2</sup> L, <sup>≤</sup>L <sup>−</sup> ←−−−−→− α γ M, ≤M is a Galois connection between complete lattices L (concrete domain) and M (abstract domain) iff α : L → M and γ : M → L are total functions that satisfy: α(l) ≤<sup>M</sup> m ⇐⇒ l ≤<sup>L</sup> γ(m), for all l ∈ L, m ∈ M.

*Over-approximating abstractions.* The *join abstraction*, *α*join, replaces each feature expression ψ with *true* if there exists at least one configuration from <sup>K</sup> that satisfies <sup>ψ</sup>. The abstract set of features is empty: *<sup>α</sup>*join(F) = <sup>∅</sup>, and abstract set of configurations is a singleton: *<sup>α</sup>*join(K) = {*true*}. The abstraction and concretization functions between *FeatExp*(F) and *FeatExp*(∅) are:

$$\alpha^{\text{join}}(\psi) = \begin{cases} true & \text{if } \exists k \in \mathbb{K}. k \mid = \psi \\ false & \text{otherwise} \end{cases} \qquad \gamma^{\text{join}}(\psi) = \begin{cases} true & \text{if } \psi \text{ is } true \\ \bigvee\_{k \in 2^{\text{jl}} \mid \k} k & \text{if } \psi \text{ is } false \end{cases}$$

which form a Galois connection [15]. In this way, we obtain a single abstract variant that includes all transitions occurring in any variant.

*Under-approximating abstractions.* The *dual join abstraction*, *α* join, replaces each feature expression ψ with *true* if all configurations from K satisfy ψ. The abstraction and concretization functions between *FeatExp*(F) and *FeatExp*(∅), forming a Galois connection [12], are defined as [9]: *α* join = ¬ ◦ *<sup>α</sup>*join ◦ ¬ and *<sup>γ</sup>* join <sup>=</sup> ¬ ◦ *<sup>γ</sup>*join ◦ ¬, that is:

$$\widetilde{\alpha \alpha^{\mathrm{join}}}(\psi) = \begin{cases} \textit{true} & \text{if } \forall k \in \mathbb{K}. k \mid = \psi \\ false & \text{otherwise} \end{cases} \qquad \widetilde{\gamma^{\mathrm{join}}}(\psi) = \begin{cases} \bigwedge\_{k \in 2^{\mathbb{F}} \backslash \mathbb{K}} (\neg k) & \text{if } \psi \text{ is } true \\ false & \text{if } \psi \text{ is } false \end{cases}$$

In this way, we obtain a single abstract variant that includes only those transitions that occur in all variants.

*Abstract MTS and Preservation of CTL.* Given a Galois connection (*α*join, *γ*join) defined on the level of feature expressions, we now define the abstraction of an FTS as an MTS with two transition relations: one (may) preserving universal properties, and the other (must) preserving existential properties. The may transitions describe the behaviour that is possible in some variant of the concrete FTS, but not need be realized in the other variants; whereas the must transitions describe behaviour that has to be present in all variants of the FTS.

**Definition 4.** *Given the FTS* <sup>F</sup> = (S, Act, trans, I, AP, L, <sup>F</sup>, <sup>K</sup>, δ)*, define MTS <sup>α</sup>*join(F)=(S, Act, trans*may*, trans*must*, I, AP, L) *to be its* abstraction*, where* trans*may* <sup>=</sup> {<sup>t</sup> <sup>∈</sup> trans <sup>|</sup> *<sup>α</sup>*join(δ(t)) = *true*}*, and* trans*must* <sup>=</sup> {<sup>t</sup> <sup>∈</sup> trans <sup>|</sup> *α* join(δ(t))=*true*}*.*

Note that the abstract model *<sup>α</sup>*join(F) has no variability in it, i.e. it contains only one abstract configuration. We now show that the 3-valued semantics of the MTS *<sup>α</sup>*join(F) is designed to be *sound* in the sense that it preserves both satisfaction (*tt*) and refutation (*ff*) of a formula from the abstract model to the concrete one. However, if the truth value of a formula in the abstract model is ⊥, then its value over the concrete model is not known. We prove [16, Appendix B]: **Theorem 1 (Preservation results).** *For every* <sup>Φ</sup> <sup>∈</sup> CTL*, we have:*

$$\begin{array}{l} \{\mathbf{1}\} \left[ \alpha^{\text{join}}(\mathcal{F}) \right] \left= {}^{3}\Phi \right] = {}^{t}t \implies \left[ \mathcal{F} \right] \left= \Phi \right] = t. \\\ \{\mathbf{2}\} \left[ \alpha^{\text{join}}(\mathcal{F}) \right] \left= {}^{3}\Phi \right] = \mathcal{G} \implies \left[ \mathcal{F} \right] \left= \Phi \right] = \mathcal{G} \text{ and } \left[ \pi\_{k}(\mathcal{F}) \right] \left= \Phi \right] = \mathcal{G} \text{ for all } k. \\\ k \in \mathbb{K}. \end{array}$$

*Divide-and-conquer strategy.* The problem of evaluating [F |<sup>=</sup> <sup>Φ</sup>] can be reduced to a number of smaller problems by partitioning the configuration space <sup>K</sup>. Let the subsets <sup>K</sup>1, <sup>K</sup>2,..., <sup>K</sup><sup>n</sup> form a *partition* of the set <sup>K</sup>. Then, [F |<sup>=</sup> <sup>Φ</sup>] = *tt* iff [π<sup>K</sup>*<sup>i</sup>* (F) <sup>|</sup><sup>=</sup> <sup>Φ</sup>] = *tt* for all <sup>i</sup> = 1,...,n. Also, [F |<sup>=</sup> <sup>Φ</sup>] = *ff* iff [π<sup>K</sup>*<sup>j</sup>* (F) <sup>|</sup><sup>=</sup> <sup>Φ</sup>] = *ff* for some 1 <sup>≤</sup> <sup>j</sup> <sup>≤</sup> <sup>n</sup>. By using Theorem 1, we obtain the following result.

**Corollary 1.** *Let* K1, K2,..., K<sup>n</sup> *form a* partition *of* K*.*

$$\begin{array}{l} \{\mathbf{1}\} \text{ } f\left[\alpha^{\text{join}}(\pi\_{\mathbb{K}\_{1}}(\mathcal{F})) \vdash \Phi\right] = tt \land \dots \land \left[\alpha^{\text{join}}(\pi\_{\mathbb{K}\_{n}}(\mathcal{F})) \vdash \Phi\right] = tt, \text{ then } [\mathcal{F} \vdash \Phi] = tt, \text{ then } [\mathcal{F} \vdash \Phi] = tt.\\ \{\mathbf{2}\} \text{ } f\left[\alpha^{\text{join}}(\pi\_{\mathbb{K}\_{j}}(\mathcal{F})) \vdash \Phi\right] = ff \text{ } for \text{ some } 1 \le j \le n, \text{ then } [\mathcal{F} \vdash \Phi] = \mathbf{f} \text{ } and \\ [\pi\_{k}(\mathcal{F}) \vdash \Phi] = \mathbf{f} \text{ } for \text{ all } k \in \mathbb{K}\_{j}. \end{array}$$

*Example 3.* Recall the FTS VendMach of Fig. 1. Figure 3 shows the MTS *α*join(VendMach), where the allowed (may) part of the behavior includes the transitions that are associated with the optional features c and f in Vend-Mach, and the required (must) part includes transitions with the presence condition *true*. Consider the properties introduced in Example 2. We have [*α*join(VendMach) <sup>|</sup>=<sup>3</sup> <sup>Φ</sup>1] = <sup>⊥</sup> and [*α*join(VendMach) <sup>|</sup>=<sup>3</sup> <sup>Φ</sup>2] = <sup>⊥</sup>, so we cannot conclude whether <sup>Φ</sup><sup>1</sup> and <sup>Φ</sup><sup>2</sup> are satisfied by VendMach or not.

## **4 Game-Based Abstract Lifted Model Checking**

The 3-valued model checking game [24,25] on an MTS <sup>M</sup> with state set <sup>S</sup>, a state <sup>s</sup> <sup>∈</sup> <sup>S</sup>, and a CTL formula <sup>Φ</sup> is played by Player <sup>∀</sup> and Player <sup>∃</sup> in order to evaluate <sup>Φ</sup> in <sup>s</sup> of <sup>M</sup>. The goal of Player <sup>∀</sup> is either to refute <sup>Φ</sup> on <sup>M</sup> or to prevent Player <sup>∃</sup> from verifying it. The goal of Player <sup>∃</sup> is either to verify <sup>Φ</sup> on M or to prevent Player ∀ from refuting it. The *game board* is the Cartesian product <sup>S</sup> <sup>×</sup> sub(Φ), where sub(Φ) is defined as:

if <sup>Φ</sup>=*true*, *false*,l,then sub(Φ)={Φ}; if <sup>Φ</sup>=ÆΦ1,then sub(Φ)={Φ}∪sub(Φ1) if <sup>Φ</sup> <sup>=</sup> <sup>Φ</sup><sup>1</sup> <sup>∧</sup> <sup>Φ</sup>2, Φ<sup>1</sup> <sup>∨</sup> <sup>Φ</sup>2, then sub(Φ) = {Φ} ∪ sub(Φ1) <sup>∪</sup> sub(Φ2) if <sup>Φ</sup> = Æ(Φ1UΦ2), Æ(Φ1VΦ2), then sub(Φ) = exp(Φ) <sup>∪</sup> sub(Φ1) <sup>∪</sup> sub(Φ2)

where Æ ranges over both A and E. The expansion exp(Φ) is defined as:

$$\begin{array}{l} \Phi = \mathsf{E}(\Phi\_1 \mathsf{U} \Phi\_2) : \exp(\Phi) = \{ \Phi, \Phi\_2 \vee (\Phi\_1 \wedge \mathsf{E} \bigcirc \Phi), \Phi\_1 \wedge \mathsf{E} \bigcirc \Phi, \mathsf{E} \bigcirc \Phi \} \\ \Phi = \mathsf{E}(\Phi\_1 \mathsf{V} \Phi\_2) : \exp(\Phi) = \{ \Phi, \Phi\_2 \wedge (\Phi\_1 \vee \mathsf{E} \bigcirc \Phi), \Phi\_1 \vee \mathsf{E} \bigcirc \Phi, \mathsf{E} \bigcirc \Phi \} \end{array}$$

A *single play* from (s, Φ) is a possibly infinite sequence of configurations <sup>C</sup><sup>0</sup> <sup>→</sup><sup>p</sup><sup>0</sup> <sup>C</sup><sup>1</sup> <sup>→</sup><sup>p</sup><sup>1</sup> <sup>C</sup><sup>2</sup> <sup>→</sup><sup>p</sup><sup>2</sup> ..., where <sup>C</sup><sup>0</sup> = (s, Φ), <sup>C</sup><sup>i</sup> <sup>∈</sup> <sup>S</sup> <sup>×</sup> sub(Φ), and <sup>p</sup><sup>i</sup> <sup>∈</sup> {Player <sup>∀</sup>,Player ∃}. The subformula in <sup>C</sup><sup>i</sup> determines which player <sup>p</sup><sup>i</sup> makes the next move. The possible moves at each configuration are:


**(4)** if <sup>C</sup><sup>i</sup> = (s, Φ<sup>1</sup> <sup>∧</sup> <sup>Φ</sup>2), then Player <sup>∀</sup> chooses <sup>j</sup> ∈ {1, <sup>2</sup>} and <sup>C</sup>i+1 = (s, Φ<sup>j</sup> ). **(5)** if <sup>C</sup><sup>i</sup> = (s, Φ<sup>1</sup> <sup>∨</sup> <sup>Φ</sup>2), then Player <sup>∃</sup> chooses <sup>j</sup> ∈ {1, <sup>2</sup>} and <sup>C</sup>i+1 = (s, Φ<sup>j</sup> ). **(6), (7)** if <sup>C</sup><sup>i</sup> = (s, Æ(Φ1UΦ2)), then <sup>C</sup>i+1 = (s, Φ<sup>2</sup> <sup>∨</sup> (Φ<sup>1</sup> <sup>∧</sup> <sup>Æ</sup> Æ(Φ1UΦ2))). **(8), (9)** if <sup>C</sup><sup>i</sup> = (s, Æ(Φ1VΦ2)), then <sup>C</sup>i+1 = (s, Φ<sup>2</sup> <sup>∧</sup> (Φ<sup>1</sup> <sup>∨</sup> <sup>Æ</sup> Æ(Φ1VΦ2))).

The moves (6)–(9) are deterministic, thus any player can make them.

A play is a *maximal play* iff it is infinite or ends in a terminal configuration. A play is infinite [26] iff there is exactly one subformula of the form AU, AV, EU, or EV that occurs infinitely often in the play. Such a subformula is called a *witness*. We have the following *winning criteria*:


A *strategy* is a set of rules for a player, telling the player which move to choose in the current configuration. A *winning strategy* from (s, Φ) is a set of rules allowing the player to win every play that starts at (s, Φ) if he plays by the rules. It was shown in [24,25] that the model checking problem of evaluating [M, s <sup>|</sup>=<sup>3</sup> <sup>Φ</sup>] can be reduced to the problem of finding which player has a winning strategy from (s, Φ) (i.e. to solving the given 3-valued model checking game).

The algorithm proposed in [24,25] for solving the given 3-valued model checking game consists of two parts. First, it constructs a *game-graph*, then it runs an *algorithm for coloring* the game-graph. The game-graph is <sup>G</sup>M×<sup>Φ</sup> = (N,E) where <sup>N</sup> <sup>⊆</sup> <sup>S</sup> <sup>×</sup> sub(Φ) is the set of nodes and <sup>E</sup> <sup>⊆</sup> <sup>N</sup> <sup>×</sup> <sup>N</sup> is the set of edges. N contains a node for each configuration that was reached during the construction of the game-graph that starts from initial configurations <sup>I</sup> × {Φ} in a BFS manner, and E contains an edge for each possible move that was applied. The nodes of the game-graph can be classified as: terminal nodes, ∧ nodes, <sup>∨</sup>-nodes, <sup>A</sup>-nodes, and <sup>E</sup>-nodes. Similarly, the edges can be classified as: progress edges, which originate in <sup>A</sup> or <sup>E</sup> nodes and reflect real transitions of the MTS M, and auxiliary nodes, which are all other edges. We distinguish two types of progress edges, two types of children, and two types of SCCs (Strongly Connected Components). *Must-edges* (*may-edges*) are edges based on must-transitions (may-transitions) of MTSs. A node n is a *must-child* (*maychild*) of the node n if there exists a must-edge (may-edge) (n, n ). A *must-SCC* (*may-SCC* ) is an SCC in which all progress edges are must-edges (may-edges).

The game-graph is partitioned into its may-Maximal SCCs (may-MSCCs), denoted <sup>Q</sup>i's. This partition induces a partial order <sup>≤</sup> on the <sup>Q</sup>i's, such that edges go out of a set Q<sup>i</sup> only to itself or to a smaller set Q<sup>j</sup> . The partial order is extended to a total order ≤ arbitrarily. The *coloring algorithm* processes the <sup>Q</sup>i's according to <sup>≤</sup>, bottom-up. Let <sup>Q</sup><sup>i</sup> be the smallest set that is not fully colored. The nodes of Q<sup>i</sup> are colored in two phases, as follows.

*Phase 1.* Apply these rules to all nodes in Q<sup>i</sup> until none of them is applicable.


*Phase 2.* If after propagation of the rules of Phase 1, there are still nodes in Q<sup>i</sup> that remain uncolored, then Q<sup>i</sup> must be a non-trivial may-MSCC that has exactly one witness. We consider two cases.

**Case** U. The witness is of the form A(Φ1UΦ2) or E(Φ1UΦ2).

*Phase 2a.* Repeatedly color by ? each node in Q<sup>i</sup> that satisfies one of the following conditions, until there is no change:

(1) An <sup>A</sup> node that all its must-children are colored by <sup>T</sup> or ?; (2) An <sup>E</sup> node that has a may-child colored by <sup>T</sup> or ?; (3) An <sup>∧</sup> node that both its children are colored <sup>T</sup> or ?; (4) An <sup>∨</sup> node that has a child colored by <sup>T</sup> or ?. In fact, each node for which the F option is no longer possible according to the rules of Phase 1 is colored by ?.

*Phase 2b.* Color the remaining nodes in Q<sup>i</sup> by F.

**Case** V. The witness is of the form A(Φ1VΦ2) or E(Φ1VΦ2) (see [16, Appendix B]).

The result of the coloring is a *3-valued coloring function* <sup>χ</sup> : <sup>N</sup> → {T, F, ?}.

**Theorem 2 (**[24]**).** *For each* n = (s, Φ ) <sup>∈</sup> <sup>G</sup>M×<sup>Φ</sup>*:*


**Fig. 4.** The colored game-graph for *<sup>α</sup>*join(VendMach) and <sup>Φ</sup><sup>1</sup> <sup>=</sup> <sup>A</sup>(¬rUr). (Color figure online)

\*\*(3)\*\*  $[(
\mathcal{M},s)]=^3\Phi']=\perp$   $iff\,\chi(n)=?$   $iff$   $non\,of$   $players\ has\ a\ winning\ strategy\ at\ n$ .

Using Theorems <sup>1</sup> and 2, given the colored game-graph of the MTS *<sup>α</sup>*join(F), if all its initial nodes are colored by <sup>T</sup> then [F |<sup>=</sup> <sup>Φ</sup>] = *tt*, if at least one of them is colored by <sup>F</sup> then [F |<sup>=</sup> <sup>Φ</sup>] = *ff*. Otherwise, we do not know.

*Example 4.* The colored game-graph for the MTS *α*join(VendMach) and Φ<sup>1</sup> = <sup>A</sup>(¬*r*U*r*) is shown in Fig. 4. Green, red (with dashed borders), and white nodes denote nodes colored by T, F, and ?, respectively. The partitions from Q<sup>1</sup> to Q<sup>6</sup> consist of a single node shown in Fig. 4, while Q<sup>7</sup> contains all the other nodes. The initial node (s0, Φ1) is colored by ?, so we obtain an indefinite answer.

## **5 Incremental Refinement Framework**

Given an FTS π<sup>K</sup>- (F) with a configuration set <sup>K</sup> <sup>⊆</sup> <sup>K</sup>, we show how to exploit the game-graph of the abstract MTS <sup>M</sup> <sup>=</sup> *<sup>α</sup>*join(π<sup>K</sup>- (F)) in order to do refinement in case that the model checking resulted in an indefinite answer. The refinement consists of two parts. First, we use the information gained by the coloring algorithm of <sup>G</sup>M×<sup>Φ</sup> in order to split the single abstract configuration *true* <sup>∈</sup> *<sup>α</sup>*join(K ) that represents the whole concrete configuration set K . We then construct the refined abstract models, using the refined abstract configurations.


**Fig. 5.** The refinement procedure that checks [F |<sup>=</sup> <sup>Φ</sup>].

There are a failure node and a failure reason associated with an indefinite answer. The goal in the refinement is to find and eliminate at least one of the failure reasons.

**Definition 5.** *A node* n *is a* failure node *if it is colored by* ?*, whereas none of its children was colored by* ? *at the time* n *got colored by the coloring algorithm.*

Such failure node can be seen as the point where the loss of information occurred, so we can use it in the refinement step to change the final model checking result.

**Lemma 1 (**[24]**).** *A failure node is one of the following.*


Given a failure node n = (s, Φ), suppose that its may-child is n = (s , Φ 1) as identified in Lemma 1. Then the may-edge from n to n is considered as *the failure reason*. Since the failure reason is a may-transition in the abstract MTS *α*join(π<sup>K</sup>- (F)), it needs to be refined in order to result either in a must transition or no transition at all. Let s α/ψ −−→s be the transition in the concrete model π<sup>K</sup>- (F) corresponding to the above (failure) may-transition. We split the configuration space <sup>K</sup> into [[ψ]] and [[¬ψ]] subsets, and we partition <sup>π</sup><sup>K</sup>- (F) in <sup>π</sup>[[ψ]]∩K- (F) and <sup>π</sup>[[¬ψ]]∩K- (F). Then, we repeat the verification process based on abstract models *<sup>α</sup>*join(π[[ψ]]∩K- (F)) and *<sup>α</sup>*join(π[[¬ψ]]∩K- (F)). Note that, in the former, *<sup>α</sup>*join(π[[ψ]]∩K- (F)), <sup>s</sup> <sup>α</sup> −→s becomes a must-transition, while in the latter, *<sup>α</sup>*join(π[[¬ψ]]∩K- (F)), <sup>s</sup> <sup>α</sup> −→s is removed. The complete refinement procedure is shown in Fig. 5. We prove that (see [16, Appendix A]):

## **Theorem 3.** *The procedure Verify(*F, <sup>K</sup>, Φ*) terminates and is correct.*

*Example 5.* We can do a failure analysis on the game-graph of *α*join(VendMach) in Fig. 4. The failure node is (s1, A <sup>A</sup>(¬*r*U*r*)) and the reason is the mayedge (s1, A <sup>A</sup>(¬*r*U*r*)) *cancel* −−−→(s0, A(¬*r*U*r*)). The corresponding concrete transition in VendMach is s<sup>1</sup> *cancel*/c −−−−−→s0. So, we partition the configuration space <sup>K</sup>VM into subsets [[c]] and [[¬c]], and in the next second iteration we consider FTSs <sup>π</sup>[[c]](VendMach) and <sup>π</sup>[[¬c]](VendMach).

**Fig. 6.** <sup>G</sup>*<sup>α</sup>* join(π[[*c*]](VendMach))×Φ<sup>1</sup> . **Fig. 7.** *<sup>α</sup>*join(π[[c]](VendMach))

The game-based model checking algorithm provides us with a convenient framework to use results from previous iterations and avoid unnecessary calculations. At the end of the i-th iteration of abstraction-refinement, we remember those nodes that were colored by definite colors. Let D denote the set of such nodes. Let <sup>χ</sup><sup>D</sup> : <sup>D</sup> → {T,F} be the coloring function that maps each node in D to its definite color. The incremental approach uses this information both in the construction of the game-graph and its coloring. During the construction of a new refined game-graph performed in a BFS manner in the next i + 1-th iteration, we prune the game-graph in nodes that are from <sup>D</sup>. When a node <sup>n</sup> <sup>∈</sup> <sup>D</sup> is encountered, we add n to the game-graph and do not continue to construct the game-graph from <sup>n</sup> onwards. That is, <sup>n</sup> <sup>∈</sup> <sup>D</sup> is considered as terminal node and colored by its previous color. As a result of this pruning, only the reachable sub-graph that was previously colored by ? is refined.

*Example 6.* The property <sup>Φ</sup><sup>1</sup> holds for <sup>π</sup>[[¬c]](VendMach). The initial node of the game-graph <sup>G</sup>*α*join(π[[¬*c*]](VendMach))×Φ<sup>1</sup> (see [16, Fig. 13, Appendix C]), is colored by T. On the other hand, we obtain an indefinite answer for π[[c]](VendMach). The model *α*join(π[[c]](VendMach)) is shown in Fig. 7, whereas the final colored game-graph <sup>G</sup>*α*join(π[[*c*]](VendMach))×Φ<sup>1</sup> is given in Fig. 6. The failure node is (s0, A <sup>A</sup>(¬*r*U*r*)), and the reason is the may-edge (s0, A <sup>A</sup>(¬*r*U*r*)) *pay* −−→(s1, A(¬*r*U*r*)). The corresponding concrete transition in π[[c]](VendMach) is s<sup>0</sup> *pay*/¬<sup>f</sup> −−−−→s1. So, in the next third iteration we consider FTSs <sup>π</sup>[[c∧¬f]](VendMach) and <sup>π</sup>[[c∧f]](VendMach).

The initial node of the graph <sup>G</sup>*α*join(π[[*c*∧¬*f*]](VendMach))×Φ<sup>1</sup> (see [16, Fig. 16, Appendix C]) is colored by F in Phase 2b. The initial node of <sup>G</sup>*α*join(π[[*c*∧*f*]](VendMach))×Φ<sup>1</sup> (see [16, Fig. 17, Appendix C]) is colored by <sup>T</sup>.

In the end, we conclude that <sup>Φ</sup><sup>1</sup> is satisfied by the variants {¬<sup>c</sup> ∧ ¬f,¬<sup>c</sup> <sup>∧</sup> f, <sup>c</sup> <sup>∧</sup> <sup>f</sup>}, and <sup>Φ</sup> is violated by the variant {<sup>c</sup> ∧ ¬f}.

On the other hand, we need two iterations to conclude that <sup>Φ</sup><sup>2</sup> <sup>=</sup> <sup>E</sup>(¬*r*U*r*) is satisfied by all variants in <sup>K</sup>VM (see [16, Appendix D] for details).

## **6 Evaluation**

To evaluate our approach, we use a synthetic example to demonstrate specific characteristics of our approach, and the Elevator model which is often used as benchmark in SPL community [4,12,15,20,23]. We compare (1) our abstraction-refinement procedure Verify with the game-based model checking algorithm implemented in Java from scratch vs. (2) family-based version of the NuSMVmodel checker, denoted fNuSMV, which implements the standard lifted model checking algorithm [5]. For each experiment, we measure T(ime) to perform an analysis task, and Call which is the number of times an approach calls the model checking engine. All experiments were executed on a 64 bit Intel-CoreTM i5-3337U CPU running at 1.80 GHz with 8 GB memory. All experimental data is available from: https://aleksdimovski.github.io/automaticctl.html.

*Synthetic example.* The FTS M<sup>n</sup> (where n > 0) consists of n features A1,...,A<sup>n</sup> and an integer data variable x, such that the set AP consists of all evaluations of x which assign nonnegative integer values to x. The set of valid configurations is K<sup>n</sup> = 2{A1,...,A*n*}. M<sup>n</sup> has a tree-like structure, where in the root is the initial state with <sup>x</sup> = 0. In each level <sup>k</sup> (<sup>k</sup> <sup>≥</sup> 1), there are two states that can be reached with two transitions leading from a state from a previous level. One transition is allowable for variants with the feature A<sup>k</sup> enabled, so that in the target state the variable's value is x + 2<sup>k</sup>−<sup>1</sup> where x is its value in the source state, whereas the other transition is allowable for variants with A<sup>k</sup> disabled, so that the value of x does not change. For example, M<sup>2</sup> is shown in Fig. 8, where in each state we show the current value of x and all transitions have the silent action τ .

We consider two properties: <sup>Φ</sup> <sup>=</sup> <sup>A</sup>(*true*U(<sup>x</sup> <sup>≥</sup> 0)) and <sup>Φ</sup> <sup>=</sup> <sup>A</sup>(*true*U(<sup>x</sup> <sup>≥</sup> 1)). The property Φ is satisfied by all variants in K, whereas Φ is violated only by one configuration <sup>¬</sup>A1∧...∧¬A<sup>n</sup> (where all features are disabled). We have verified M<sup>n</sup> against Φ and Φ using fNuSMV (e.g. see fNuSMVmodels for M<sup>1</sup> and M<sup>2</sup> in [16, Fig. 23, Appendix E]). We have also checked M<sup>n</sup> using our Verify procedure. For Φ, Verify terminates in one iteration since *α*join(Mn) satisfies <sup>Φ</sup> (see <sup>G</sup>*α*join(M1)×<sup>Φ</sup> in [16, Fig. 24, Appendix E]). For <sup>Φ</sup> , Verify needs n + 1 iterations. First, an indefinite result is reported for *α*join(Mn) (e.g. see <sup>G</sup>*α*join(M1)×Φ in [16, Fig. 27, Appendix E]), and the configuration space is split into [[¬A1]] and [[A1]] subsets. The refinement procedure proceeds in this way until we obtain definite results for all variants. The performance results are shown in Fig. 9. Notice that, fNuSMV reports all results in only one iteration. As <sup>n</sup> grows, Verify becomes faster than fNuSMV. For <sup>n</sup> = 11 (|K<sup>|</sup> = 2<sup>11</sup>), fNuSMV timeouts after 2 h. In contrast, Verify is feasible even for large values of n.


**Fig. 8.** The model <sup>M</sup>2. **Fig. 9.** Verification of <sup>M</sup><sup>n</sup> (<sup>T</sup> in seconds).


**Fig. 10.** Verification of Elevator properties (<sup>T</sup> in seconds).

Elevator. We have experimented with the Elevator model with four floors, designed by Plath and Ryan [23]. It contains about 300 LOC of fNuSMV code and 9 independent optional features that modify the basic behaviour of the elevator, thus yielding 2<sup>9</sup> = 512 variants. To use our Verify procedure, we have manually translated the fNuSMV model into an FTS and then we have called Verify on it. The basic Elevator system consists of a single lift that travels between four floors. There are four platform buttons and a single lift, which declares variables floor, door, direction, and a further four cabin buttons. When serving a floor, the lift door opens and closes again. We consider three properties "Φ<sup>1</sup> <sup>=</sup> <sup>E</sup>(*tt*U(floor= 1 <sup>∧</sup> idle <sup>∧</sup> door=closed))", "Φ<sup>2</sup> <sup>=</sup> <sup>A</sup>(*tt*U(floor= 1 <sup>∧</sup> idle <sup>∧</sup> door=closed))", and "Φ<sup>3</sup> <sup>=</sup> <sup>E</sup>(*tt*U((floor= 3∧ ¬lif tBut3.pressed∧direction<sup>=</sup> up) =<sup>⇒</sup> door <sup>=</sup> closed))". The performance results are shown in Fig. 10. The properties Φ<sup>1</sup> and Φ<sup>2</sup> are satisfied by all variants, so Verify achieves speed-ups of 28 times for Φ<sup>1</sup> and 2.7 times for Φ<sup>2</sup> compared to the fNuSMV approach. fNuSMV takes 1.76 sec to check Φ3, whereas Verify ends in 0.67 sec thus giving 2.6 times performance speed-up.

## **7 Related Work and Conclusion**

There are different formalisms for representing variability models [2,21]. Classen et al. [4] present Featured Transition Systems (FTSs). They show how specifically designed lifted model checking algorithms [5,7] can be used for verifying FTSs against LTL and CTL properties. The variability abstractions that preserve LTL are introduced in [14,15,17], and subsequently automatic abstraction refinement procedures [8,18] for lifted model checking of LTL are proposed, by using Craig interpolation to define the refinement. The variability abstractions that preserve the full CTL are introduced in [12], but they are constructed manually and no notion of refinement is defined there. In this paper, we define an automatic abstraction refinement procedure for lifted model checking of full CTL by using games to define the refinement. To the best of our knowledge, this is the first such procedure in lifted model checking.

One of the earliest attempts for using games for CTL model checking has been proposed by Stirling [26]. Shoham and Grumberg [3,19,24,25] have extended this game-based approach for CTL over 3-valued semantics. In this work, we exploit and apply the game-based approach in a completely new direction, for automatic CTL verification of variability models.

The works [11,13] present an approach for software lifted model checking of #ifdef-based program families using symbolic game semantics models [10].

To conclude, in this work we present a game-based lifted model checking for abstract variability models with respect to the full CTL. We also suggest an automatic refinement procedure, in case the model checking result is indefinite.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Formal Verification of Safety & Security Related Timing Constraints for a Cooperative Automotive System**

Li Huang<sup>1</sup> and Eun-Young Kang2(B)

<sup>1</sup> School of Data and Computer Science, Sun Yat-Sen University, Guangzhou, China huangl223@mail2.sysu.edu.cn

<sup>2</sup> The Maersk Mc-Kinney Moller Institute, University of Southern Denmark, Odense, Denmark

eyk@mmmi.sdu.dk

**Abstract.** Modeling and analysis of timing constraints is crucial in realtime automotive systems. Modern vehicles are interconnected through wireless networks which creates vulnerabilities to external malicious attacks. Violations of cyber-security can cause safety related accidents and serious damages. To identify the potential impacts of security related threats on safety properties of interconnected automotive systems, this paper presents analysis techniques that support verification and validation (V&V) of safety & security (S/S) related timing constraints on those systems: Probabilistic extension of S/S timing constraints are specified in PrCcsl (probabilistic extension of clock constraint specification language) and the semantics of the extended constraints are translated into verifiable Uppaal models with stochastic semantics for formal verification. A set of mapping rules are proposed to facilitate the translation. An automatic translation tool, namely ProTL, is implemented based on the mapping rules. Formal verification are performed on the S/S timing constraints using Uppaal-SMC under different attack scenarios. Our approach is demonstrated on a cooperative automotive system case study.

**Keywords:** Automotive system · Safety and security · PrCcsl · Uppaal-SMC

## **1 Introduction**

Model based development (MBD) is rigorously applied in automotive systems in which the software controllers interact with physical environments. The continuous time behaviors of those systems often rely on complex dynamics as well as on stochastic behaviors. Formal verification and validation (V&V) technologies are indispensable and highly recommended for development of safe and reliable automotive systems [11,12]. Conventional V&V, i.e., testing and model checking have limitations in terms of assessing the reliability of hybrid systems due to both stochastic and non-linear dynamical features. To ensure the reliability of safety critical hybrid dynamic systems, *statistical model checking (SMC)* techniques have been proposed [7,8,19]. These techniques for fully stochastic models validate probabilistic performance properties of given deterministic (or stochastic) controllers in given stochastic environments.

Modern vehicles are being equipped with communication devices and interconnected with each other through wireless networks. Vehicular Ad Hoc Networks (Vanet) [28] are the technologies of wireless networks that establish communication among vehicles and roadside units (RSU). Nevertheless vehicular communication contributes to the safety and efficiency of traffic, it introduces vulnerabilities to vehicles. Transmitted information can be corrupted or modified by attackers, resulting in serious safety consequences (e.g., rear-end collision). Analysis of the potential impacts of cyber-security violations on safety properties is crucial in automotive systems. However, traditional automotive system design often addresses the correctness of safety properties without consideration of security breaches. There is still a lack of techniques that enable an integrated analysis of safety & security (S/S) properties. Moreover, message transmission in Vanet that pertains to S/S requires restrictions by time deadlines [10]. In this paper, we focus on S/S related timing constraints and propose analysis techniques that support formal verification on interconnected automotive systems.

East-adl [9,22] is an architectural description language for modeling of automotive systems. The latest release of East-adl has adopted the time model proposed in Timing Augmented Description Language (Tadl2) [5], which expresses and composes basic timing constraints, i.e., repetition rates, end-to-end delays. Tadl2 specializes the time model of MARTE, the UML profile for Modeling and Analysis of Real-Time and Embedded systems [30]. MARTE provides Ccsl, a Clock Constraint Specification Language, that supports specification of both logical and dense timing constraints, as well as functional causality constraints [16,23]. A probabilistic extension of Ccsl, called PrCcsl [14], has been proposed to formally specify timing constraints associated with stochastic properties in weakly-hard real-time systems [4], i.e., a bounded number of constraints violations would not lead to system failures when the results of the violations are negligible.

In this paper, we present a formal analysis of S/S related timing constraints for interconnected automotive systems at the design level: 1. To identify vulnerabilities of automotive systems under malicious attacks, we adopt and modify the behavioral model of a cooperative automotive system (CAS) [13] in Uppaal-SMC by adding it with the models of an RSU-aided (Raise) communication protocol in Vanet and malicious attacks. The modification results in a refined behavioral model of the system, i.e., more details in terms of vehicular communication and security breaches are depicted; 2. Probabilistic extension of S/S timing constraints are specified in PrCcsl and the semantics of the extended constraints are translated into verifiable models with stochastic semantics for formal verification; 3. A set of mapping rules are proposed to facilitate the translation, based on which an automatic translation tool ProTL is implemented; 4. Formal verification is performed on the S/S timing constraints using Uppaal-SMC under different attack scenarios.

The paper is organized as follows: Sect. 2 presents an overview of PrCcsl and Uppaal-SMC. CAS is introduced as a running example in Sect. 3. Section 4.1 presents the Uppaal-SMC model of CAS complemented with model of Raise protocol and three types of attacks. S/S related timing constraints are specified in PrCcsl and translated into verifiable Uppaal-SMC models in Sect. 5. The applicability of our approach is demonstrated by performing verification on CAS case study in Sect. 6. Sections 7 and 8 present related works and conclusion.

## **2 Preliminary**

In our framework, S/S related timing constraints are specified in PrCcsl. Uppaal-SMC is employed to perform formal verification on the timing constraints.

## **2.1 Probabilistic Extension of Clock Constraint Specification Language (PrCCSL)**

PrCcsl [14] is a probabilistic extension of Ccsl [3,23] for formal specification of timing constraints associated with stochastic behaviors. In PrCcsl, a clock represents a sequence of (possibly infinite) instants. An event is a clock and the occurrences of an event correspond to a set of ticks of the clock. PrCcsl provides two types of clock constraints, i.e., *expressions* and *relations*, to specify the progression/occurrences of clocks. An *expression* derives new clocks from the already defined clocks [3]. Let <sup>c</sup>1, c<sup>2</sup> <sup>∈</sup> <sup>C</sup>, ITE (if-then-else) *expression*, denoted as β ? c1 : c2, defines a new clock that behaves either as c1 or as c2 according to the value of the boolean variable/formula β. DelayFor (denoted ref (d) - base) results in a new clock by delaying the reference clock *ref* for *d* ticks (or *d* time units) of a *base* clock. FilterBy (*c base u(v)*) builds a new clock *c* by filtering the instants of a *base* clock according to a binary word *w=u(v)*, where *u* is the *prefix* and *v* is the *period*. "*(v)*" denotes the infinite repetition of *v*. This expression results in a clock *<sup>c</sup>* that <sup>∀</sup> <sup>k</sup> <sup>∈</sup> <sup>N</sup> <sup>+</sup>, if the <sup>k</sup>*th* bit in *<sup>w</sup>* is 1, then at the k*th* tick of *base*, *c* ticks.

A *relation* limits the occurrences among different events, which are defined based on run and history. A run corresponds to an execution of the system model where the clocks tick/progress. The history of a clock c represents the number of times the clock c has ticked prior to the current step.

**Definition 1 (Run).** *A run* R *consists of a finite set of consecutive steps where a set of clocks tick at each step* i*. The set of clocks ticking at step* i *is denoted as* R(i)*, i.e., for all* i*, 0* i n*,* R(i) ∈ R*, where* n *is the number of steps of* R*.*

**Definition 2 (History).** *The history of clock* c *in a run* R *is a function:* H*<sup>c</sup> R:* <sup>N</sup> <sup>→</sup> <sup>N</sup>*.* <sup>H</sup>*<sup>c</sup> <sup>R</sup>*(i) *indicates the number of times the clock* c *has ticked prior to step* i *in run R, which is initialized as 0 at step 0. It is defined as: (1)* H*<sup>c</sup> R*(0) = 0*;* *(2)* <sup>∀</sup> <sup>i</sup> <sup>∈</sup> <sup>N</sup><sup>+</sup>, c/<sup>∈</sup> <sup>R</sup>(i) =<sup>⇒</sup> <sup>H</sup>*<sup>c</sup> <sup>R</sup>*(i + 1) = H*<sup>c</sup> <sup>R</sup>*(i)*; (3)* <sup>∀</sup> <sup>i</sup> <sup>∈</sup> <sup>N</sup><sup>+</sup>, c <sup>∈</sup> <sup>R</sup>(i) =<sup>⇒</sup> H*c <sup>R</sup>*(i + 1) = H*<sup>c</sup> <sup>R</sup>*(i)+1*.*

A probabilistic *relation* in PrCcsl is satisfied if and only if the probability of the *relation* constraint being satisfied is greater than or equal to the probability threshold <sup>p</sup> <sup>∈</sup> [0, 1]. Given <sup>k</sup> runs <sup>=</sup> {R1,...,R*k*}, the probabilistic subclock, coincidence, exclusion and precedence in PrCcsl are defined as follows:

**Probabilistic Subclock:** c1⊆*p*c2 ⇐⇒ P r[c1⊆c2] p, where P r[c1⊆c2] = 1 *k* - *k j*=1 {R*<sup>j</sup>* <sup>|</sup><sup>=</sup> <sup>c</sup>1⊆c2}, representing the ratio of runs that satisfies the relation out of k runs. A run R*<sup>j</sup>* satisfies the subclock relation between c1 and c2 "if c1 ticks, c2 must tick" holds at every step i in R*<sup>j</sup>* , s.t., (R*<sup>j</sup>* |= c1⊆c2) ⇐⇒ (∀i 0 i n, c1 ∈ R(i) =⇒ c2 ∈ R(i)). "R*<sup>j</sup>* |= c1⊆c2" returns 1 if R*<sup>j</sup>* satisfies c1⊆c2, otherwise it returns 0.

**Probabilistic Coincidence:** c1≡*<sup>p</sup>*c2 ⇐⇒ P r[c1≡c2] p, where P r[c1≡c2] = 1 *k* - *k j*=1 {R*<sup>j</sup>* <sup>|</sup><sup>=</sup> <sup>c</sup>1≡c2}, which represents the ratio of runs that satisfies the

coincidence relation out of k runs. A run, R*<sup>j</sup>* satisfies the coincidence relation on c1 and c2 if the assertion holds: ∀i, 0 i n, (c1 ∈ R(i) =⇒ c2 ∈ R(i))∧ (c2 ∈ <sup>R</sup>(i) =<sup>⇒</sup> <sup>c</sup><sup>1</sup> <sup>∈</sup> <sup>R</sup>(i)). In other words, the satisfaction of coincidence relation is established when the two conditions "if c1 ticks, c2 must tick" and "if c2 ticks, c1 must tick" hold at every step.

**Probabilistic Exclusion:** c1#*p*c2 ⇐⇒ P r[c1#c2] p, where P r[c1#c2] = 1 *k* - *k j*=1 {R*<sup>j</sup>* <sup>|</sup><sup>=</sup> <sup>c</sup>1#c2}, indicating the ratio of runs that satisfies the exclusion relation out of k runs. A run, R*<sup>j</sup>* , satisfies the exclusion relation on c1 and c2 if ∀i, 0 i n, (c1 ∈ R(i) =⇒ c2 ∈/ R(i)) ∧ (c2 ∈ R(i) =⇒ c1 ∈/ R(i)), i.e., for every step, if c1 ticks, c2 must not tick and vice versa.

**Probabilistic Precedence:** c1≺*<sup>p</sup>*c2 ⇐⇒ P r[c1≺c2] p, where P r[c1≺c2] = 1 *k* - *k j*=1 {R*<sup>j</sup>* <sup>|</sup><sup>=</sup> <sup>c</sup>1≺c2}, which denotes the ratio of runs that satisfies the precedence relation out of k runs. A run R*<sup>j</sup>* satisfies the precedence relation if the condition <sup>∀</sup>i, 0 <sup>i</sup> <sup>n</sup>, (H*<sup>c</sup>*<sup>1</sup> *<sup>R</sup>* (i) H*<sup>c</sup>*<sup>2</sup> *<sup>R</sup>* (i)) and (H*<sup>c</sup>*<sup>2</sup> *<sup>R</sup>* (i) = H*<sup>c</sup>*<sup>1</sup> *<sup>R</sup>* (i)) =⇒ (c2 ∈/ R(i)) hold, i.e., the history of c1 is greater than or equal to the history of c2, and c2 must not tick when the history of the two clocks are equal.

#### **2.2 UPPAAL-SMC**

UPPAAL-SMC [31] performs the probabilistic analysis of properties by monitoring simulations of the complex hybrid system in a given stochastic environment and using results from the statistics to determine whether the system satisfies the property with some degree of confidence. Uppaal-SMC provides a number of queries related to the stochastic interpretation of Timed Automata (STA) [8] and they are as follows, where N and bound indicate the number of simulations to be performed and the time bound on the simulations respectively: 1. *Probability Estimation* estimates the probability of a requirement property φ being satisfied for a given STA model within the time bound: P r[bound] φ; 2. *Hypothesis Testing* checks if the probability of φ is satisfied within a certain probability <sup>P</sup>0: P r[bound] <sup>φ</sup> <sup>≥</sup> <sup>P</sup>0; 3. *Simulations*: Uppaal-SMC runs multiple simulations on the STA model and the k (state-based) properties/expressions φ1, ..., φ*<sup>k</sup>* are monitored and visualized along the simulations: simulate N [≤ bound]{φ1, ..., φ*<sup>k</sup>*}.

## **3 Running Example**

A cooperative automotive system (CAS) [13] is adopted to illustrate our approaches. CAS includes distributed and coordinated sensing, control, and actuation over three vehicles (denoted as v*i*, where i ∈ {0, 1, 2}) which are running in the same lane. As shown in Fig. 1, a lead vehicle (v0) runs automatically by recognizing traffic signs on the road. The following vehicle must set its desired velocity identical to that of its immediate preceding vehicle. Vehicles should maintain sufficient braking distance to avoid rear-end collision while remaining close enough to guarantee communication quality. Vehicle movement relies on availability of environmental information, e.g., traffic signs, obstacles, etc. The position of v*<sup>i</sup>* is represented by Cartesian coordinate (x*i*, y*i*), where x*<sup>i</sup>* and y*<sup>i</sup>* are distances measured from the vehicle to the two fixed perpendicular lines, i.e., x-axis and y-axis, respectively.

**Fig. 1.** Overview of Cooperative Automotive System

The cooperative driving of CAS requires prompt and secure information transmission among vehicles. We adopt a roadside unit aided (Raise) [33] communication protocol in Vanet to achieve the data transmission. Each vehicle periodically broadcasts its own position and velocity to its immediate following vehicle through wireless connection. The authentication of the identities of each vehicle and verification of messages sent by the vehicles is performed by RSU. For further details of Raise, refer to Sect. 4.1. The following S/S properties on CAS are considered:

R1. The follower vehicle should not overtake its leading vehicle when the vehicles run at a positive direction of x-axis.

R2. When the lead vehicle detects a stop sign, all the three vehicles must stop within a given time, e.g., 2000 ms.

R3. If the distance between a vehicle and its preceding vehicle is less than minimum safety distance, the vehicle should decelerate within a certain time (200 ms). R4. If the distance between a vehicle and its preceding vehicle is greater than the maximum safety distance (e.g., 100 m), the vehicle should accelerate within a certain time, e.g., 300 ms.

R5. When the lead vehicle starts to turn left (or turn right), the two follower vehicles should finish turning and run in the same lane within a given time.

R6. Authenticity: If a vehicle receives a message, its preceding vehicle must have sent a corresponding message before, i.e., the protocol should be resistant to message spoofing attack.

R7. Secrecy: Symmetric keys of vehicles should be kept confidential to attackers. R8. Integrity: The content of messages must not be modified during transmission, i.e., the protocol should be resistant to message falsification attack.

R9. Freshness: The vehicles should not accept an "obsolete" message, namely, the difference between the current time and the *timestamp* of the accepted message should be less than the predefined time threshold.

R10. The symmetric key agreement (i.e., mutual authentication) process between RSU and three vehicles should be completed within a certain time, e.g., 600 ms. R11. A vehicle should send messages to its subsequent vehicle periodically with a period 200 ms and a jitter 100 ms.

Among the above S/S requirements, R1–R5 are safety [20] properties, which specify that the system should not cause undesirable results on its environment and aim at protecting human lives, health and assets from being damaged. R6- R11 are security properties, which refer to the inability of the environment to affect the system in an undesirable way and aim to guarantee the confidentiality and integrity of transmitted information. The interdependencies among those S/S properties are conditional dependencies [17], i.e., violations of security properties can lead to the violations on safety properties. The events associated with those S/S properties can be interpreted as logical clocks in PrCcsl, which provides a way to express S/S properties in the logical time manner [16]. Therefore, S/S properties can be interpreted as logical timing constraints, i.e., the temporal and causality clock *relations* in PrCcsl.

The methodology for analysis of S/S related timing constraints in this paper can be generalized in Fig. 2. First, on the basis of the existing behavioral model of CAS described in [13], we enhance the CAS model by augmenting (parallelly composing) it with models of Raise protocol and malicious attacks, resulting in a refined CAS model regarding vehicular communication characteristics and security-related adversary interference. Second, we specify S/S timing constraints (R1–R11) in PrCcsl and translate the PrCcsl specifications into corresponding STA and probabilistic queries. Finally, we combine the model of CAS and the STA of PrCcsl specifications, and perform formal verification based on the combined model using Uppaal-SMC.

**Fig. 2.** Methodology for analysis of S/S timing constraints

## **4 Modeling and Refinement of CAS in UPPAAL-SMC**

The behaviors of CAS are modeled as a network of stochastic timed automata (NSTA) in Uppaal-SMC described in [13]. In this section, we refine the CAS model by adding it with the models of Raise protocol and security attacks.

## **4.1 Modeling of RAISE Protocol in UPPAAL-SMC**

We present a simplified version of Raise protocol [33] and its Uppaal-SMC model. The original Raise protocol is modified to facilitate the communication mechanism of CAS, i.e., each follower vehicle receives messages from its immediate preceding vehicle and RSU. Furthermore, timing constraints are also appended to restrict the time duration of each step (e.g., encryption and decryption) during communication process. There are two phases in Raise protocol, i.e., *symmetric key agreement* and *information transmission*.

1. **Symmetric key agreement (SKA)** is performed to obtain symmetric key k*<sup>i</sup>* for guaranteeing security of communication and generates pseudo identities ID*<sup>i</sup>* of vehicles for covering their real identities. The shared symmetric key between RSU and v*<sup>i</sup>* is k*<sup>i</sup>* = g*ab*, where g, a, b are three positive random numbers. As shown in Fig. 3, Encry(msg, k) (Decry(msg, k)) denotes the encryption (decryption) of message msg with key k, where k can be either a public key or symmetric key. Sign(msg, k) generates signature of msg with a private key k. We use PK*<sup>i</sup>* to denote the public key of v*<sup>i</sup>* and SK*<sup>i</sup>* to represent the corresponding private key. "||" is the concatenation operation on messages.

Initially, v*<sup>i</sup>* randomly picks g and a (step 1), encrypts "g||a" and sends the encrypted result (m*i*) to RSU (step 2). Upon receiving m*i*, RSU decrypts the message (step 3). It then generates b and ID*i*, signs and sends the signed message (rm*i*) to v*<sup>i</sup>* (step 4 and 5). v*<sup>i</sup>* verifies the rm*i*'s signature (step 6) and sends back the signature of g||a||b||ID*<sup>i</sup>* (step 7). Finally, RSU verifies the signature s*<sup>i</sup>* (step 8). If all the steps are completed correctly, the key agreement process succeeds.

2. **Information transmission (IT)** initiates after the SKA is completed. The traffic information (i.e, brake, direction, position and speed) of v*<sup>i</sup>* is integrated into a message msg*<sup>i</sup>* = brake*i*||direction*i*||x*i*||y*i*||speed*i*. As presented in Fig. 4, initially, v*<sup>i</sup>* generates the message authentication code (MAC) of msg*<sup>i</sup>* with the symmetric key k*<sup>i</sup>* (generated in SKA). Then, v*<sup>i</sup>* concatenates the MAC code with

**Fig. 4.** Information transmission in Raise

msg*<sup>i</sup>* and sends it to RSU and v*<sup>i</sup>*+1 (step 1). Upon receiving vm*i*, v*<sup>i</sup>*+1 checks the freshness of the message (step 2), i.e., if the time interval between the current time and the time when vm*<sup>i</sup>* is sent is greater than the predefined threshold, v*<sup>i</sup>*+1 drops vm*i*. At the same time, RSU checks the authenticity of vm*<sup>i</sup>* (step 3). If mac*<sup>i</sup>* is correct, RSU computes the hash code h*<sup>i</sup>* of message msg*<sup>i</sup>* (step 4). Afterwards, it encrypts h*<sup>i</sup>* and sends the encrypted result hm*<sup>i</sup>* to v*<sup>i</sup>*+1 (step 5). v*<sup>i</sup>*+1 decrypts hm*<sup>i</sup>* and get the hash code h (step 6). Furthermore, to ensure the consistency of the message, v*<sup>i</sup>*+1 itself also computes the hash code of msg*<sup>i</sup>* (step 7). It then verifies whether the hash code calculated by itself is the same as the decrypted hash code and decides to accept or reject msg*<sup>i</sup>* (step 8).

To model Raise in Uppaal-SMC, interactions among vehicles and RSU (i.e., sending/receiving messages) are modeled by *synchronization channels* [31] and global variables. The cryptographic operations in Raise refer to public and private key encryption and decryption, i.e., a message encrypted by public key can be decrypted using the corresponding private key, and vice versa. The automaton of cryptographic device [6] is adopted to model the encryption and decryption. Figure 5 presents the STA capturing behaviors of vehicle v*<sup>i</sup>* and RSU in SKA. *startEn* (resp. *startDe*) and *finDe* (resp. *finEn*) are channels for indicating the starting and finishing of encryption (resp. decryption). The encryption/decryption result is denoted *en res*/*de res*. In the STA, names of locations indicate the corresponding steps pictured in Fig. 3.

IT phase from v<sup>0</sup> to v<sup>1</sup> is established with the help of RSU, modeled as the STA shown in Fig. 6 (the transmission from v<sup>1</sup> to v<sup>2</sup> can be modeled similarly). The behaviors of v<sup>0</sup> (sender), v<sup>1</sup> (receiver) and RSU in the IT phase are modeled in IT v0, IT v1 and IT RSU STA, respectively.

The SKA (or IT) succeeds if each step of the SKA (IT) is completed correctly within a given time interval, modeled by invariant "t ≤ d" (the value of *d* varies in different steps). If timeout occurs (i.e., "t ≥ d"), *fail* location will be activated and the procedure is restarted from the initial step.

**Fig. 5.** Uppaal-SMC model of SKA

**Fig. 6.** Uppaal-SMC model of IT

## **4.2 Modeling of Attacks in UPPAAL-SMC**

We present the modeling of three types of attacks commonly used in the security analysis, i.e., message falsification, message replaying and message spoofing attacks [2]. The models of attacks are illustrated in Fig. 7, where the *ls* parameter (*ls* ∈ [0, 100]) serves as an indicator of level of adversarial strength while *qc* (*qc* ∈ [0, 100]) is an indicator of the adversarial channel quality.

**Message Falsification Attack** (MFA) aims to falsify messages transmitted from v*<sup>i</sup>* to v*<sup>i</sup>*+1, which is modeled as MFA STA in Fig. 7. As described earlier, in Raise, RSU verifies the authenticity of messages by checking the correctness of the MAC code of messages. To deceive the RSU on the validity of the modified message and avoid exposing itself to RSU, MFA attempts to obtain the symmetric key and utilizes the key to compute the MAC code of the falsified message. At s1 state, MFA eavesdrops on rm*<sup>i</sup>* (generated at step 5 in Fig. 3), which contains the information for symmetric key generation (i.e., *g*, *a*, *b*). It tries to decrypt rm*<sup>i</sup>* when receiving it via sendrm[i]?. The probability that the decryption can succeed is ls%, modeled by probabilistic choices [31] (dashed edges) with probability weight as *ls* <sup>100</sup> and <sup>100</sup>−*ls* <sup>100</sup> . If the decryption succeeds, MFA obtains the symmetric key of v*<sup>i</sup>* based on the decrypted result (getKey(de res)). Finally, it modifies the content of message using the key, and tries to send the modified message to v*<sup>i</sup>*+1 (sendvm[i]!). The probability that the message can be sent successfully is (100-qc)%. In our setting, MFA modifies the speed*<sup>i</sup>* field in the message into a random value in [100, 120], and changes the direction as direction*<sup>i</sup>* = 4, which indicates that the v*<sup>i</sup>* is running at the positive direction on y-axis.

**Fig. 7.** STA of attacks

**Message Replaying Attack** (MRA) targets to replay obsolete messages that contain old information. The MRA STA represents an MRA that replays messages sent by v*i*. Upon capturing a message (via sendvm[i]?), MRA stores the message (*m = vm[i]*) and tries to replay it at a later time (i.e., after *10* s). The probability that the attacker can replay the message successfully is (100-qc)%.

**Message Spoofing Attack** (MSA) impersonates a vehicle (v*i*) in order to inject fraudulent information into its subsequent vehicle (v*<sup>i</sup>*+1). Similar to MFA, MSA STA first obtains the symmetric key of v*<sup>i</sup>* by detecting and decrypting rm*i*. It then fabricates a new message whose content is "brake*<sup>i</sup>* = 0, speed*<sup>i</sup>* = 0, direction*<sup>i</sup>* = 4, x*<sup>i</sup>* = 0, y*<sup>i</sup>* = 10" (denoted "*encode(i)*") and tries to send the message to v*<sup>i</sup>*+1 (sendvm[i]!), with the probability of the message being sent successfully as (100-qc)%.

## **5 Representation of S/S Related Timing Constraints in UPPAAL-SMC**

To enable the formal verification of S/S related timing constraints (given in Sect. 3), we first investigate how to specify those constraints in PrCcsl. Then, translation from PrCcsl specifications of the constraints into verifiable STA is demonstrated. Furthermore, a tool ProTL that supports the automatic transformation based on the proposed translation rules is introduced.

#### **5.1 Specifications of S/S Related Timing Constraints in PrCCSL**

The specifications of R1–R11 are presented in Table 1, where *ac* is a clock that always ticks while *nc* represents a clock that never ticks. R1 is specified as an exclusion *relation* between *xdir* (the event that the vehicles are running at the positive direction of x-axis) and *ovtake* (the event that the position of follower v<sup>1</sup> on x-axis is greater than that of leader v0). Similarly, R7 and R9 can be specified as exclusion *relations*.

In the specification of R2, *stopD* is a clock generated by delaying *stopSign* (the event that the leader vehicle detects a stop sign) for 2000 ms. *vstop* refers

**Table 1.** PrCCSL specifications of R1–R11


to the event that three vehicles are completely stopped, which should occur no later than *stopD*. Hence, R2 is expressed as a causality *relation* between *vstop* and *stopD*. R3–R5 can be specified in a similar manner.

R6 (authenticity) is expressed as a subclock *relation* between *msgRec* and *msgSent*, where *msgRec* (*msgSent*) represents the event that a message is received (sent) by the follower (leader) vehicle. R8 is specified as a coincidence *relation* between *msgRec* and *validMsg*, where *validMsg* is a clock that ticks with *msgRec* when the received message *rMsg* is identical with the sent message *sMsg* (i.e., *rMsg == sMsg*). For R10, *startSKA* (*finSKA*) represents the starting (completion) of SKA. *startSKADe* is a clock constructed by delaying *startSKA* for 600 ms. R10 delimits that *finSKA* must occur before *startSKADe*. R11 states that two consecutive occurrences of *msgSent* must has a interval of [*period* − *jitter, period + jitter* ]ms (i.e., [100, 300] ms). In the specification of R11, *fclk* is a clock generated by filtering out the 1*st* tick of *msgSent*. *sentDe1* and *sentDe2* are two clocks generated by delaying *msgSent* for 100 ms and 300 ms. R11 can be interpreted as: <sup>∀</sup><sup>i</sup> <sup>∈</sup> <sup>N</sup><sup>+</sup>, the <sup>i</sup> *th* tick of *fclk* should occur later than the i *th* tick of *sentDe1* but prior to the i *th* tick of *sentDe2*.

#### **5.2 Translation of PrCCSL into STA**

We present how the S/S related timing constraints specified in PrCcsl can be transformed into STA and probabilistic queries in Uppaal-SMC. We first describe how clock tick and history (introduced in Sect. 2) can be represented in Uppaal-SMC. Using the mapping, we then demonstrate that *expressions* and *relations* in PrCcsl can be translated into STA and queries.

In the earlier work [14], the semantics of PrCcsl operators are translated into STA based on discrete time, i.e., the continuous physical time is discretized into a set of equalized steps. As a result, two clock instants are still considered coincident even if they are one time step apart. To alleviate this restriction and enable the representation of PrCcsl that pertains to continuous real-time semantics, the mapping patterns are refined: two clock instants are coinstantaneous only if the time difference between them is insignificant, i.e., the time difference between them is less than a positive infinitesimal value *e*, e.g., *e* = 0.000001.

In PrCcsl, a logical clock represents an event and the instants of the clock correspond to the occurrences of the event. A logical clock c is represented as a *synchronization channel* c! in Uppaal-SMC. The history of c is modeled as the STA shown in Fig. 8: whenever c occurs (c?), the value of its history is increased by 1 (i.e., *h++*).

**Fig. 8.** History

Based on the mapping patterns of tick and history, the PrCcsl *expressions* (including ITE, DelayFor and filterBy), as well as *relations* (including subclock, coincidence, exclusion and precedence), can be represented as STA and queries shown in Fig. 9.

The STA of *expressions* trigger the ticks of the new clock (denoted res!) based on the occurrences of existed clocks. To represent *relations*, observer STA that capture the semantics of standard subclock, coincidence, exclusion and precedence *relations* are constructed. Each observer STA contains a "*fail*" location (see Fig. 9), which indicates the violation of the corresponding *relation*. Recall the definition of PrCcsl in Sect. 2, the probability of a *relation* being satisfied is interpreted as a ratio of runs that satisfies the *relation* among all runs. It is specified as *Hypothesis Testing* queries in Uppaal-SMC, H0: *<sup>m</sup> <sup>k</sup>* p against H1: *<sup>m</sup> <sup>k</sup>* < p, where m is the number of runs satisfying the given *relation* out of all k runs. As a result, the probabilistic *relations* are interpreted as the query (see Fig. 9): P r[bound]([ ] ¬ST A.f ail) ≥ p, which means that the probability of the "*fail*" location of the observer STA never being reached should be greater than or equal to p. The STA of *expressions* and *relations* are composed to the system NSTA in parallel. Then, the probabilistic analysis is performed over the composite NSTA that enables us to verify the S/S related timing constraints over the entire system using Uppaal-SMC.

**Tool support:** Manual translation of PrCcsl specifications into Uppaal models for verification can be time-consuming and error-prone. To improve the accuracy and efficiency of translation, we implement a tool ProTL (Probabilistic-Ccsl TransLator) [26] that provides a push-button transformation from PrCcsl specifications into corresponding STA & queries. Furthermore, verification and simulation support is provided in ProTL by employing the Uppaal-SMC as the backend analysis engine. ProTL encompasses the following features: (1) An editor for editing PrCcsl specification of requirements (stored as *".txt"* files); (2) Automated transformation of PrCcsl specifications into Uppaal-SMC STA; (3) Integration of the STA and the system behavioral model (imported by users); (4) A configuration palette for setting parameters (e.g., time bound of simula-


**Fig. 9.** STA of PrCcsl operators

tion, number of simulations) used for verification and simulation; (5) Automatic generation of probabilistic queries (introduced in Sect. 2) based on user-specified parameters; (6) Capability of performing verification and simulation on PrCcsl specifications against the integrated model and generated queries.

The GUI of ProTL is implemented by applying the Python package TKIN-TER [27]. The implementation of *Translator* is achieved by the ANother Tool for Language Recognition (ANTLR) [24], a parser generator that can constructs lexical parsers for a language by analyzing user-defined syntax of the language. We specified the syntax of PrCcsl in Backus-Naur Form (BNF) and apply ANTLR to generate a *parser* that can analyze and recognize encodings in the format of PrCcsl. The *parser* reads the PrCcsl specifications and generates abstract syntax trees (AST), i.e., an intermediate form that has tree structures. By traversing AST, the information (i.e., operators and parameters) of PrCcsl can be extracted and utilized for generation of corresponding STA.

## **6 Experiment**

To identify vulnerabilities of system to external malicious attackers, we combine the refined CAS system model (including the models of Raise protocol) with models of three different attackers. Formal verification on S/S related timing constraints (R1–R11) for the combined model is performed by Uppaal-SMC. The combined CAS model contains the stochastic behaviors in terms of the unpredictable environments (e.g., the traffic signs are randomly recognized by the leader vehicle of CAS and the probability of each sign type occurring is equally set as 16.7%), as well as the indeterministic behaviors modeled by weighted probability choices in the STA of attacks (see Fig. 7). In our setting, ls and qc are configured as 10 and 90, respectively. To estimate the probability of an attack being launched on CAS successfully, *Probability Estimation* query is applied to check the probability that the "attack" location in each attack STA is reachable from the system NSTA. The time bound of the verification is set as 10000. The probability of message falsification, message replaying and message spoofing attack being successfully completed by the corresponding attacker is within the range of [0.109, 0.209], [0.563, 0.663] and [0.143, 0.243], respectively.

In our experiments, S/S related timing constraints are specified in PrCcsl and transformed into STA using ProTL. Each constraint is specified as a PrCcsl *relation* (as described in Sect. 5.1) whose probability threshold is 95%. The verification results are demonstrated in Table 2, in which "√" denotes the corresponding requirement is satisfied while "×" indicates the violation of the requirement: Under the message replaying attack, all the S/S timing constraints are established as valid with 95% level of confidence. In the message falsification attack, the secrecy and integrity properties (R7 and R8), as well as three safety properties (R3–R5), are violated. The MSA damages the authenticity (R6) and secrecy (R7) of communication, and leads to the violations of four safety properties, i.e., R1 and R3–R5.


**Table 2.** Verification results of timing constraints under different attacks

The experiment results indicate the severity of impacts on safety and security caused by the demonstrated attacks on CAS: No requirement is violated under MRA scenario while the MSA causes the violations of most safety properties. When CAS is attached with the STA of MSA or MFA, the secrecy of symmetric key is violated. With the obtained symmetric key, MSA can masquerade message as legitimate vehicles and MFA is able to tamper the content of messages without being detected, leading to the violations of authenticity (R6) and integrity (R7) respectively. To explore how the malicious attackers can influence the safety of system, we conduct simulation by using *Simulations* queries. The simulation results in Fig. 10 illustrate how an MSA drives the system to undesirable states.

**Fig. 10.** Simulation results of R1 and R4: (a) At *Time* = 2345, the attack occurs (indicated by the rising edge of the red line). MSA sends the fabricated position information of *V*<sup>0</sup> to *V*<sup>1</sup> (the value of *recx* becomes 0), which tricks *V*<sup>1</sup> to think that the distance between *V*<sup>0</sup> and *V*<sup>1</sup> exceeds the maximum limit. *V*<sup>1</sup> keeps increasing its speed (*speed*1) and thus leading to the collision (indicated by *x*<sup>0</sup> == *x*1) at *Time* = 3815, which violates R1. (b) When an attack takes place at *Time* = 2496 (indicated by the rising edge of the blue line), *V*<sup>1</sup> receives the message from the attacker and is deluded into believing that the speed of *V*<sup>0</sup> is 0. Therefore, *V*<sup>1</sup> keeps decreasing its speed even if the distance between *V*<sup>0</sup> and *V*<sup>1</sup> becomes greater than 100 m, which violates R4. (Color figure online)

## **7 Related Work**

Formal verification of (non)-functional properties of automotive systems containing stochastic behaviors were investigated in several works [13–15]. In these works, systems are by default resilient to security threats and the safety properties are analyzed under no malicious attack scenarios, which is inadequate for design of automotive systems interconnected via wireless communications. Combined analysis of safety and security (S/S) properties for interconnected cyber physical systems have been addressed in earlier works [1,21,29], which are however, limited to theoretical frameworks and high-level descriptions of S/S properties without the support for formal verification. Pedroza et al. [25] proposed a SysML based environment called AVATAR for the formal verification of S/S properties, which enables assessment of the impacts of cyber-security threats on functional safety. Wardell et al. [32] proposed an approach for identifying security vulnerabilities of industrial control systems by modeling malicious attacks as PROMELA models amenable to formal verification. However, those approaches lack precise probabilistic annotations specifying stochastic properties regarding to S/S aspects. Kumar et al. [18] introduced the attack-fault trees formalism for descriptions of attack scenarios and conducted formal analysis by using Uppaal-SMC to obtain quantitative estimation on impacts of system failures or security threats. On the other hand, our work is based on the probabilistic extension of S/S related timing constraints with the focus on probabilistic verification of the extended constraints.

## **8 Conclusion**

This paper presents a model-based approach for probabilistic formal analysis of safety and security (S/S) related timing constraints for interconnected automotive system in East-adl at the early design phase. The behavioral model of automotive system in Uppaal-SMC is refined by adding the models of vehicular communication protocol and malicious attacks, which facilitates to exploit the impacts of adversary environment on S/S of the system. Timing constraints are specified in PrCcsl and translated into stochastic timed automata (STA) amenable to formal verification using Uppaal-SMC. A set of translation rules from PrCcsl to STA, as well as the corresponding tool support for automating the translation are provided. We demonstrate our approach by performing formal verification on a cooperative automotive system (CAS) case study. Although, we have shown the one-to-one mapping patterns from a subset of PrCcsl operators to STA for conducting formal verification on timing constraints using Uppaal-SMC, as ongoing work, systematic and formal translation techniques covering a full set of PrCcsl constraints are further studied. Furthermore, new features of ProTL with respect to analysis of Uppaal-SMC models involving wider range of variable/query types (e.g., *urgent channels*, *bounded integers*) are further developed.

**Acknowledgment.** This work is supported by the EASY project funded by NSFC, a collaborative research between Sun Yat-Sen University and University of Southern Denmark.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Checking Observational Purity of Procedures**

Himanshu Arora<sup>1</sup>, Raghavan Komondoor<sup>1</sup>, and G. Ramalingam2(B)

<sup>1</sup> Indian Institute of Science, Bangalore, India {himanshua,raghavan}@iisc.ac.in <sup>2</sup> Microsoft Research, Bellevue, WA, USA grama@microsoft.com

**Abstract.** Verifying whether a procedure is *observationally pure* (that is, it always returns the same result for the same input argument) is challenging when the procedure uses mutable (private) global variables, e.g., for memoization, and when the procedure is recursive.

We present a deductive verification approach for this problem. Our approach encodes the procedure's code as a logical formula, with recursive calls being modeled using a mathematical function symbol *assuming that the procedure is observationally pure*. Then, a theorem prover is invoked to check whether this logical formula agrees with the function symbol referred to above in terms of input-output behavior for all arguments. We prove the soundness of this approach.

We then present a conservative approximation of the first approach that reduces the verification problem to one of checking whether a quantifier-free formula is satisfiable and prove the soundness of the second approach.

We evaluate our approach on a set of realistic examples, using the Boogie intermediate language and theorem prover. Our evaluation shows that the invariants are easy to construct manually, and that our approach is effective at verifying observationally pure procedures.

## **1 Introduction**

A procedure in an imperative programming language is said to be *observationally pure* (OP) if for each specific argument value it has a specific return value, across all possible sequences of calls to the procedure, irrespective of what other code runs between these calls. In other words, the input-output behavior of an OP procedure mimics a mathematical function.

A deterministic procedure that does not read any pre-existing state other than its arguments is trivially OP. However, it is common for procedures to update and read global variables, typically for performance optimization, while still being OP. In this paper, we focus on the problem of checking observational purity of procedures that read and write global variables, especially in the presence of recursion, which makes the problem harder.

```
1
2 int g := −1;
3 int lastN := 0 ;
4 int factCache ( int n ) {
5 i f ( n <= 1) {
6 r e s u l t := 1 ;
7 } else if ( g != −1 && n == l a s tN ) {
8 r e s u l t := g ;
9 } else {
10 g=n ∗ factCache ( n − 1 );
11 lastN = n;
12 r e s u l t := g ;
13 }
14 return result ;
15 }
```
**Listing 1.1.** Procedure factCache: returns n!, and memoizes most recent result.

*Motivating Example.* We use procedure 'factCache' in Listing 1.1 as our running example. It returns n! for a given argument n, and caches the return value of the most recent call. It uses two *private global* variables, g and lastN, to implement the caching. g is initialized to −1. After the first call to the procedure onwards, g stores the return value of the most recent call, and lastN stores the argument of the most recent call. Clearly this procedure is OP, and mimics the input-output behavior of a factorial procedure that does not cache any results.

*Proposed Approach.* Our approach is based on Floyd-Hoare logic, which typically requires a specification of the procedure to be provided. One candidate specification would be a full functional specification of the procedure. If the user specifies that factCache realizes n!, then the verifier could replace Line 10 in the code with 'g = n \* (n − 1)!'. This, on paper, is sufficient to assert that Line 12 always assigns n! to result. However, to establish that Line 8 also does the same, an invariant would need to be provided that describes the possible values of g before an invocation to the procedure. In our example, a suitable invariant would be '(g = −1) ∨ (g = lastN!)'. The verifier would also need to verify that at the procedure's exit the invariant is re-established. Lines 10–12, with the recursive call replaced by (n − 1)!, suffices on paper to re-establish the invariant.

The candidate approach described above, while plausible, suffers from two weaknesses. First, a mathematical specification of the function being computed may be complex and non-trivial to write. (Note, for example, that factCache is defined for negative integers while factorial is not. Thus, the previous candidate specification is actually incorrect for this edge case.) Second, the underlying theorem prover would need to prove complex arithmetic properties, e.g., that n \* (n − 1)! is equal to n!. Complex proofs such as this may be beyond the scope of many existing theorem provers.

Our key insight is to sidestep the challenges mentioned by introducing a function symbol, say *factCache*, and replacing the recursive call for the purposes of verification with this function symbol. (Note that we reuse the same symbol for two purposes, which may be slightly confusing here. One denotes the procedure name, while the other denotes a function symbol for use in a logical formula. The italicized name here denotes the function symbol.) Intuitively, *factCache* represents the mathematical function that the given procedure mimics *if* the procedure is OP. In our example, Line 10 would become 'g = n \* *factCache*(n − 1)'. This step needs no human involvement. The approach needs an invariant; however, in a novel manner, we allow the invariant also to refer to *factCache*. In our example, a suitable invariant would be '(g = −1) ∨ (g = lastN \* *factCache*(lastN − 1))'. This sort of invariant is relatively easy to construct; e.g., a human could arrive at it just by looking at Line 2 and with a local reasoning on Lines 10 and 11. Given this invariant, (a) a theorem prover could infer that the condition in Line 7 implies that Line 8 necessarily copies the value of 'n \* *factCache*(n − 1)' into 'result'. Due to the transformation to Line 10 mentioned above, (b) the theorem prover can infer that Line 12 also does the same. Note that since these two expressions are syntactically identical, a theorem prover can easily establish that they are equal in value. Finally, since Line 6 is reached under a different condition than Lines 8 and 12, the verifier has finished establishing that the procedure always returns the same expression in n for any given value of n.

Similarly, using the modified Line 10 mentioned above and from Line 11, the prover can re-establish that g is equal to 'lastN \* *factCache*(lastN − 1)' when control reaches Line 12. Hence, the necessary step of proving the given invariant to be a valid invariant is also complete.

Note, the effectiveness of the approach depends on the nature of the given invariant. For instance, if the given invariant was '(g = −1) ∨ (g = lastN!)', which is also technically correct, then the theorem prover may not be able to establish that in Lines 8 and 12 the variable 'g' always stores the same expression in n. However, it is our claim that in fact it is the invariant '(g = − 1) ∨ (g = lastN \* *factCache*(lastN − 1))' that is easier to infer by a human or by a potential tool, as justified by us two paragraphs above.

*Salient Aspects of Our Approach.* This paper makes two significant contributions. First, it tackles the circularity problem that arises due to the use of a presumed-to-be OP procedure in assertions and invariants and the use of these invariants in proving the procedure to be OP. This requires us to prove the soundness of an approach that *simultaneously* verifies observational purity as well the validity of invariants (as they cannot be decoupled).

Secondly, we show that a direct approach to this verification problem (which we call the existential approach) reduces it to a problem of verifying that a logical formula is a tautology. The structure of the generated formula, however, makes the resulting theorem prover instances hard. We show how a conservative approximation can be used to convert this hard problem into an easier problem of checking satisfiability of a quantifier-free formula, which is something within the scope of state-of-the-art theorem provers.

The most closely related previous approaches are by Barnett et al. [1,2], and by Naumann [3]. These approaches check observational purity of procedures that maintain mutable global state. However, none of these approaches use a function

```
L ∈ Lib ::= g := c P
   P ∈ Proc ::= p (x) { S; return y }
   S ∈ Stmt ::= x := e | x := p(y) | S ; S | if (e) then S else S
   e ∈ Expr ::= c | x | e op e | unop e
  op ∈ Ops ::= + | - | / | * | % | > | < | == | ∧ | ∨
unop ∈ UnOps ::= ¬
  x, y ∈ LocalId ∪ GlobalId, g ∈ GlobalId, c ∈ V, p ∈ ProcId
```
**Fig. 1.** Programming language syntax and meta-variables

symbol in place of recursive calls or within invariants. Therefore, it is not clear that these approaches can verify recursive procedures. Barnett et al., in fact, state "there is a circularity - it would take a delicate argument, and additional conditions, to avoid unsoundness in this case". To the best of our knowledge ours is the first paper to show that it is feasible to check observational purity of procedures that maintain mutable global state for optimization purposes and that make use of recursion.

Being able to verify that a procedure is OP has many potential applications. The most obvious one is that OP procedures can be memoized. That is, inputoutput pairs can be recorded in a table, and calls to the procedure can be elided whenever an argument is seen more than once. This would not change the semantics of the overall program that calls the procedure, because the procedure always returns the same value for the same argument (and mutates only private global variables). Another application is that if a loop contains a call to an OP procedure, then the loop can be parallelized (provided the procedure is modified to access and update its private global variables in a single atomic operation).

The rest of this paper is structured as follows. Section 2 introduces the core programming language that we address. Section 3 provides formal semantics for our language, as well as definitions of invariants and observational purity. Section 4 describes our approach formally. Section 5 discusses an approach for generating an invariant automatically in certain cases. Section 6 describes evaluation of our approach on a few realistic examples. Section 7 describes related work. More details about the proofs and the examples can be found in [4].

## **2 Language Syntax**

In this paper, we assume that the input to the purity checker is a library consisting of one or more procedures, with shared state consisting of one or more variables that are private to the library. We refer to these variables as "global" variables to indicate that they retain their values across multiple invocations of the library procedures, but they cannot be accessed or modified by procedures outside the library (that is, the clients of the library).

In Fig. 1, we present the syntax of a simple programming language that we address in this paper. Given the foundational focus of this work, we keep the programming language very simple, but the ideas we present can be generalized. A return statement is required in each procedure, and is permitted only as the last statement of the procedure. The language does not contain any looping construct. Loops can be modelled as recursive procedures. The formal parameters of a procedure are readonly and cannot be modified within the procedure. We omit types from the language. We permit only variables of primitive types. In particular, the language does not allow pointers or dynamic memory allocation. Note that expressions are pure (that is, they have no side effects) in this language, and a procedure call is not allowed in an expression. Each procedure call is modelled as a separate statement.

For simplicity of presentation, without loss of conceptual generality, we assume that the library consists of a single (possibly recursive) procedure, with a single formal parameter. In the sequel, we will use the symbol p (as a metavariable) to represent this library procedure, p (as a metavariable) to represent the *name* of this procedure, and will assume that the name of the formal parameter is n. If the procedure is of the form "p (n) { S; return r }", we refer to r as the *return* variable, and refer to "S; return r" as the *procedure body* and denote it as body(p). The library also contains, outside of the procedure's code, a sequence of initializing declarations of the global variables used in the procedure, of the form "g1 := c1; ...; gN := cN". These initializations are assumed to be performed once during any execution of the client application, just before the first call to the procedure p is placed by the client application.

Throughout this paper we use the word 'procedure' to refer to the library procedure p, and use the word 'function' to refer to a mathematical function.

## **3 A Semantic Definition of Purity**

In this section, we formalize the input-output semantics of the procedure p as a relation <sup>p</sup>, where n <sup>p</sup> r indicates that an invocation of p with input n may return a result of r. The procedure is defined to be observationally pure if the relation <sup>p</sup> is a (partial) function: that is, if n <sup>p</sup> r<sup>1</sup> and n <sup>p</sup> r2, then r<sup>1</sup> = r2.

The object of our analysis is a single-procedure library, not the entire (client) application. (Our approach can be generalized to handle multi-procedure libraries.) The result of our analysis is valid for any client program that uses the procedure/library. The only assumptions we make are: (a) The shared state used by the library (the global variables) are private to the library and cannot be modified by the rest of the program, and (b) The client invokes the library procedures sequentially: no concurrent or overlapping invocations of the library procedures by a concurrent client are permitted.

The following semantic formalism is motivated by the above observations. It can be seen as the semantics of the so-called "most general sequential client" of procedure p, which is the program: while (\*) x = p (random());. The executions (of p) produced by this program include all possible executions (of p) produced by all sequential clients.

Let G denote the set of global variables. Let L denote the set of local variables. Let V denote the set of numeric values (that the variables can take). An element

**Fig. 2.** A small-step operational semantics for our language, represented as a relation <sup>σ</sup><sup>1</sup> <sup>→</sup><sup>p</sup> <sup>σ</sup>2. A state <sup>σ</sup>i is a configuration of the form ((S, ρ-)γ, ρg) where <sup>S</sup> captures statements to be executed in current procedure, <sup>ρ</sup> assigns values to local variables, γ is the call-stack (excluding current procedure), and <sup>ρ</sup>g assigns values to global variables.

<sup>ρ</sup>g <sup>∈</sup> <sup>Σ</sup>G <sup>=</sup> G → V maps global variables to their values. An element <sup>ρ</sup>- ∈ <sup>Σ</sup>L <sup>=</sup> L → V maps local variables to their values. We define a *local continuation* to be a statement sequence ending with a return statement. We use a local continuation to represent the part of the procedure body that still remains to be executed. Let <sup>Σ</sup>C represent the set of local continuations. The set of runtime states (or simply, *states*) is defined to be (ΣC <sup>×</sup> <sup>Σ</sup>L)<sup>∗</sup> <sup>×</sup> <sup>Σ</sup>G, where the first component represents a runtime stack, and the second component the values of global variables. We denote individual states using symbols σ, σ1, σi, etc. The runtime stack is a sequence, each element of which is a pair (S, ρ-) consisting of the remaining procedure fragment S to be executed and the values of local variables <sup>ρ</sup>-. We write (S, ρ-)γ to indicate a stack where the topmost entry is (S, ρ-) and γ represents the remaining part of the stack.

We say that a state ((S, ρ-)γ, ρg) is an *entry-state* if its location is at the procedure entry point (*i.e.*, if S is the entire body of the procedure), and we say that it is an *exit-state* if its location is at the procedure exit point (*i.e.*, if S consists of just a return statement).

A procedure p determines a single-step execution relation →p, where σ<sup>1</sup> →<sup>p</sup> σ<sup>2</sup> indicates that execution proceeds from state σ<sup>1</sup> to state σ<sup>2</sup> in a single step. Figure 2 defines this semantics. The semantics of evaluation of a side-effect-free expression is captured by a relation (ρ, e) ⇓ v, indicating that the expression e evaluates to value v in an *environment* ρ (by *environment*, we mean an element of (G ∪ L) → V). We omit the definition of this relation, which is straightforward. We use the notation ρ<sup>1</sup> ρ<sup>2</sup> to denote the union of two disjoint maps ρ<sup>1</sup> and ρ2.

Note that most rules captures the usual semantics of the language constructs. The last two rules, however, capture the semantics of the most-general sequential client explained previously: when the call stack is empty, a new invocation of the procedure may be initiated (with an arbitrary parameter value).

Note that all the following definitions are parametric over a given procedure p. E.g., we will use the word "execution" as shorthand for "execution of p".

We define an *execution* (of <sup>p</sup>) to be a sequence of states <sup>σ</sup>0σ<sup>1</sup> ··· <sup>σ</sup>n such that <sup>σ</sup>i <sup>→</sup><sup>p</sup> <sup>σ</sup>i+1 for all 0 <sup>≤</sup> i<n. Let <sup>σ</sup>init denote the *initial state* of the library; i.e., this is the element of <sup>Σ</sup>G that is induced by the sequence of initializing declarations of the library, namely, "g1 := c1; ...; gN := cN". We say that an execution <sup>σ</sup>0σ<sup>1</sup> ··· <sup>σ</sup>n is a *feasible* execution if <sup>σ</sup><sup>0</sup> <sup>=</sup> <sup>σ</sup>init. Note, intuitively, a feasible execution corresponds to the sequence of states visited within the library across all invocations of the library procedure over the course of a single execution of the most-general client mentioned above; also, since the most-general client supplies a random parameter value to each invocation of p, in general multiple feasible executions of the library may exist.

We define a *trace* (of <sup>p</sup>) to be a substring <sup>π</sup> <sup>=</sup> <sup>σ</sup><sup>0</sup> ··· <sup>σ</sup>n of a feasible execution such that: (a) <sup>σ</sup><sup>0</sup> is entry-state (b) <sup>σ</sup>n is an exit-state, and (c) <sup>σ</sup>n corresponds to the return from the invocation represented by σ0. In other words, a trace is a state sequence corresponding to a single invocation of the procedure. A trace may contain within it nested sub-traces due to recursive calls, which are themselves traces. Given a trace <sup>π</sup> <sup>=</sup> <sup>σ</sup><sup>0</sup> ··· <sup>σ</sup>n, we define *initial*(π) to be <sup>σ</sup>0, *final*(π) to be <sup>σ</sup>n, *input*(π) to be value of the input parameter in *initial*(π), and *output*(π) to be the value of the return variable in *final*(π).

We define the relation <sup>p</sup> to be {(*input*(π), *output*(π)) | π is a trace of p}.

**Definition 1 (Observational Purity).** *A procedure* p *is said to be* observationally pure *if the relation* <sup>p</sup> *is a (partial) function: that is, if for all* n*,* r1*,* r2*, if* n <sup>p</sup> r<sup>1</sup> *and* n <sup>p</sup> r2*, then* r<sup>1</sup> = r2*.*

**Logical Formula and Invariants.** Our methodology makes use of *logical formulae* for different purposes, including to express a given *invariant*. Our logical formulae use the local and global variables in the library procedure as free variables, use the same operators as allowed in our language, and make use of universal as well as existential quantification. Given a formula ϕ, we write ρ |= ϕ to denote that ϕ evaluates to true when its free variables are assigned values from the environment ρ.

As discussed in Sect. 1, one of our central ideas is to allow the names of the library procedures to be referred to in the invariant; *e.g.*, our running example becomes amenable to our analysis using an invariant such as '(g = −1) ∨ (g = lastN \* *factCache*(lastN − 1))'. We therefore allow the use of library procedure names (in our simplified presentation, the name p) as free variables in logical formulae. Correspondingly, we let each environment ρ map each procedure name to a mathematical function in addition to mapping variables to numeric values, and extend the semantics of ρ |= ϕ by substituting the values of both variables and procedure names in ϕ from the environment ρ.

Given an environment ρ, a procedure name p, and a mathematical function f, we will write ρ[p → f] to indicate the updated environment that maps p to the value f and maps every other variable x to its original value ρ[x]. We will write (ρ, f) |= ϕ to denote that ρ[p → f] |= ϕ.

Given a state <sup>σ</sup> = ((S, ρ-)γ, ρg), we define env(σ) to be <sup>ρ</sup>- <sup>ρ</sup>g, and given a state <sup>σ</sup> = ([], ρg), we define env(σ) to be just <sup>ρ</sup>g. We write (σ, f) <sup>|</sup><sup>=</sup> <sup>ϕ</sup> to denote that (env(σ), f) |= ϕ. For any execution or trace π, we write (π, f) |= ϕ if for every entry-state and exit-state σ in π, (σ, f) |= ϕ. We now introduce another definition of observational purity.

**Definition 2 (Observational Purity wrt an Invariant).** *Given an invariant* ϕinv*, a library procedure* p *is said to satisfy* pure(ϕinv) *if there exists a function* f *such that for every trace* π *of* p*, output*(π) = f(*input*(π)) *and* (π, f) |= ϕinv*.*

It is easy to see that if procedure p satisfies pure(ϕinv) wrt any given candidate invariant ϕinv, then p is observationally pure as per Definition 1.

## **4 Checking Purity Using a Theorem Prover**

In this section we provide two different approaches that, given a procedure p and a candidate invariant ϕinv, use a theorem prover to check conservatively whether procedure p satisfies pure(ϕinv).

#### **4.1 Verification Condition Generation**

We first describe an adaptation of standard verification-condition generation techniques (*e.g.*, see [5]) that we use as a common first step in both our approaches. Given a procedure p, a candidate invariant ϕinv, our goal is to compute a pair (ϕpost, ϕvc) where ϕpost is a postcondition describing the state that exists after an execution of body(p) starting from a state that satisfies ϕinv, and ϕvc is a verification-condition that must hold true for the execution to satisfy its invariants and assertions.

We first transform the procedure body as below to create an internal representation that is input to the postcondition and verification condition generator. In the internal representation, we allow the following extra forms of statements (with their usual meaning): havoc(x), assume e, and assert e.


"assume" expression that refers to the function symbol p. In other words, there are no procedure calls in the transformed procedure.

3. We replace the "return x" statement by "assert ϕinv". Note that we intentionally do *not* assert that the return value equals p(n).

Let TB(p, ϕinv) denote the transformed body of procedure p obtained as above.

```
post(ϕpre, x := e) =(∃x.ϕpre) ∧ (x=e) (if x ∈ vars(e))
post(ϕpre, havoc(x)) = ∃x.ϕpre
post(ϕpre, assume e) = ϕpre ∧ e
post(ϕpre, assert e) = ϕpre
post(ϕpre, S1; S2) = post(post(ϕpre, S1), S2)
post(ϕpre, if e then S1 else S2) = post(ϕpre ∧ e, S1) ∨ post(ϕpre ∧ ¬e, S2)
vc(ϕpre, assert e) =(ϕpre ⇒ e)
vc(ϕpre, S1; S2) = vc(ϕpre, S1) ∧ vc(post(ϕpre, S1), S2)
vc(ϕpre, if e then S1 else S2) = vc(ϕpre ∧ e, S1) ∧ vc(ϕpre ∧ ¬e, S2)
vc(ϕpre, S) = true(for all other S)
postvc(p, ϕinv)=(post( ϕinv, TB(p, ϕinv)), vc(ϕinv, TB(p, ϕinv)) ∧ (init(p) ⇒ ϕinv))
```
**Fig. 3.** Generation of verification-condition and postcondition.

We then compute postconditions as formally described in Fig. 3. This lets us compute for each program point in the procedure, a condition <sup>ϕ</sup> that describes what we expect to hold true when execution reaches if we start executing the procedure in a state satisfying ϕinv and if every recursive invocation of the procedure also terminates in a state satisfying ϕinv. We compute this using the standard rules for the postcondition of a statement. For an assignment statement "x := e", we use existential quantification over x to represent the value of x prior to the execution of the statement. If we rename these existentially quantified variables with unique new names, we can lift all the existential quantifiers to the outermost level. When transformed thus, the condition <sup>ϕ</sup> takes the form <sup>∃</sup>x<sup>1</sup> ··· <sup>x</sup>n.ϕ, where <sup>ϕ</sup> is quantifier-free and <sup>x</sup>1, ··· , xn denote intermediate values of variables along the execution path from procedure-entry to program point .

We compute a verification condition ϕvc that represents the conditions we must check to ensure that an execution through the procedure satisfies its obligations: namely, that the invariant holds true at every call-site and at procedureexit. Let denote a call-site or the procedure-exit. We need to check that

```
1 g := −1;
2 lastN := 0 ;
3 factCache (n) {
4 i f ( n <= 1) {
5 r e s u l t := 1 ;
6 } else if ( g != −1 && n == l a s tN ) {
7 r e s u l t := g ;
8 } else {
9 t1 := n−1;
10 // t2 := factCache ( t1 ) ;
11 assert ϕinv ;
12 havoc (g) ; havoc ( lastN ) ;
13 assume ϕinv∧ ( t2 = factCache ( t1 ) ) ;
14 g := n ∗ t2 ;
15 lastN := n ;
16 r e s u l t := g ;
17 }
18 // return r e su l t ;
19 assert ϕinv ;
20 }
```
**Listing 1.2.** Procedure factCache from Listing 1.1 transformed to incorporate a supplied candidate invariant ϕinv.

ϕ- ⇒ ϕinv holds. Thus, the generated verification condition essentially consists of the conjunction of this check over all call-sites and procedure-exit.

Finally, the function postvc computes the postcondition and verification condition for the entire procedure as shown in Fig. 3. (Thus, it returns a pair of formulae.) Note that this function also adds the check that the initial state must satisfy ϕinv to the verification condition (as the basis condition for induction). init(p) is basically the formula "g1 = c1 ∧... gN = cN" (see Sect. 2).

*Example.* We now illustrate the postcondition and verification condition generated from our factorial example presented in Listing 1.1. Listing 1.2 shows the example expressed in our language and transformed as described earlier (using function TB), using a supplied candidate invariant ϕinv.

Figure 4 illustrates the computation of postcondition and verification condition from this transformed example. In this figure, we use ϕpre cs to denote the precondition computed to hold just before the recursive callsite, and ϕpost cs to denote the postcondition computed to hold just after the recursive callsite. The postcondition ϕpost (at the end of the procedure body) is itself a disjunction of three path-conditions representing execution through the three different paths in the program. In this illustration, we have simplified the logical conditions by omitting useless existential quantifications (that is, any quantification of the form ∃x.ψ where x does not occur in ψ). Note that the existentially quantified g and lastN in ϕpost cs denote the values of these globals before the recursive call. Similarly, the existentially quantified g and lastN in ϕpath <sup>3</sup> denote the values of these globals when the recursive call terminates, while the free variables g and lastN denote the final values of these globals.

$$\begin{split} & \mathsf{INT}(\mathsf{p}) = \{\mathsf{g} = -\mathsf{1}\} \land \{\mathsf{last}\mathsf{N} = \mathsf{0}\} \\ & \varphi\_{1}^{path} = \varphi^{inv} \land \{\mathsf{n} \prec \mathsf{1}\} \land \{\mathsf{result} = \mathsf{1}\} \\ & \varphi\_{2}^{path} = \varphi^{inv} \land \neg \{\mathsf{n} \prec \mathsf{n} = \mathsf{1}\} \land \{\mathsf{g} \mathrel{\mathsf{s} = 1}\} \land \{\mathsf{n} = \mathsf{last}\mathsf{N}\} \land \{\mathsf{result} = \mathsf{g}\} \\ & \varphi\_{cs}^{proc} = \varphi^{inv} \land \neg \{\mathsf{n} \preceq \mathsf{n} = \mathsf{1}\} \land \neg \{\mathsf{(g \mathrel{\mathsf{s} = 1}) \land \mathsf{(n = 1 \text{astN})}\} \land \{\mathsf{t1 = n \text{-}1\}} \} \\ & \varphi\_{cs}^{post} = \left(\exists \mathsf{g} \exists \mathsf{lastN}. \varphi\_{cs}^{proc}\right) \land \varphi^{inv} \land \mathsf{(t2 = factcface\{\mathsf{t1}\})} \\ & \varphi\_{3}^{path} = \left(\exists \mathsf{g} \exists \mathsf{lastN}. \varphi\_{cs}^{post}\right) \land \{\mathsf{g} = \mathsf{n} \ \mathsf{t2}\} \land \left(\mathsf{last N} = \mathsf{n}\right) \land \left(\mathsf{result} = \mathsf{g}\right) \\ & \varphi^{post} = \varphi\_{1}^{path} \lor \varphi\_{2}^{path} \lor \varphi\_{3}^{path} \\ & \varphi^{vc} = \left(\varphi\_{cs}^{proc} \Rightarrow \varphi^{inv}\right) \land \left(\varphi^{post} \Rightarrow \varphi^{inv}\right) \land \left(\mathsf{n} \mathsf{n$$

**Fig. 4.** The different formulae computed from the procedure in Listing 1.2 by our postcondition and verification-condition computation.

#### **4.2 Approach 1: Existential Approach**

Let p be a procedure with input parameter n and return variable r. Let postvc(p, ϕinv)=(ϕpost, ϕvc). Let ψ<sup>e</sup> denote the formula ϕvc ∧ (ϕpost ⇒ (r = p(n))). Let x denote the sequence of all free variables in ψ<sup>e</sup> except for p. We define ea(p, ϕinv) to be the formula ∀x.ψe.

In this approach, we use a theorem prover to check whether ea(p, ϕinv) is satisfiable. As shown by the following theorem, satisfiability of ea(p, ϕinv) establishes that p satisfies pure(ϕinv).

**Theorem 1.** *A procedure* p *satisfies* pure(ϕinv) *if* ∃p.ea(p, ϕinv) *is a tautology (which holds iff* ea(p, ϕinv) *is satisfiable).*

*Proof.* Note that p is the only free variable in ea(p, ϕinv). Assume that [p → f] is a satisfying assignment for ∀x.ψe. We show that for every feasible execution π: (P1) (π, f) ϕinv, and (P2) for every trace π inside π, *output*(π ) = f(*input*(π )). This implies that p satisfies pure(ϕinv).

In particular, for any feasible execution π, we prove by induction over the execution steps in π that


If the above properties fail to hold, we can identify a trace π corresponding to the first such failure. It can be shown that the sequence of states visited by this trace, when substituted for x, are a witness that [p → f] is not a satisfying assignment for ∀x.ψe. This is a contradiction of our original assumption.

Please see [4] for more details of the proof.

#### **4.3 Approach 2: Impurity Witness Approach**

The existential approach presented in the previous section has a drawback. Checking satisfiability of ea(p, ϕinv) is hard because it contains universal quantifiers and existing theorem provers do not work well enough for this approach. We now present an approximation of the existential approach that is easier to use with existing theorem provers. This new approach, which we will refer to as the impurity witness approach, reduces the problem to that of checking whether a quantifier-free formula is unsatisfiable, which is better suited to the capabilities of state-of-the-art theorem provers. This approach focuses on finding a counterexample to show that the procedure is impure or it violates the candidate invariant.

Let p be a procedure with input parameter n and return variable r. Let postvc(p, ϕinv)=(ϕpost, ϕvc). Let ϕpost α denote the formula obtained by replacing every free variable <sup>x</sup> other than <sup>p</sup> in <sup>ϕ</sup>post by a new free variable <sup>x</sup>α. Define ϕpost β similarly. Define iw(p, ϕinv) to be the formula (¬ϕvc) <sup>∨</sup> (ϕpost α <sup>∧</sup> <sup>ϕ</sup>post β <sup>∧</sup> (nα <sup>=</sup> <sup>n</sup>β) <sup>∧</sup> (rα <sup>=</sup> <sup>r</sup>β)).

The impurity witness approach checks whether iw(p, ϕinv) is satisfiable. This can be done by separately checking whether ¬ϕvc is satisfiable and whether (ϕpost α <sup>∧</sup> <sup>ϕ</sup>post β <sup>∧</sup> (n<sup>α</sup> <sup>=</sup> <sup>n</sup>β) <sup>∧</sup> (r<sup>α</sup> <sup>=</sup> <sup>r</sup>β)) is satisfiable. As formally defined, <sup>ϕ</sup>vc and ϕpost contain embedded existential quantifications. As explained earlier, these existential quantifiers can be moved to the outside after variable renaming and can be omitted for a satisfiability check. (A formula of the form ∃x.ψ is satisfiable iff ψ is satisfiable.) As usual, these existential quantifiers refer to intermediate values of variables along an execution path. Finding a satisfying assignment to these variables essentially identifies a possible execution path (that satisfies some other property).

#### **Theorem 2.** *A procedure* p *satisfies* pure(ϕinv) *if* iw(p, ϕinv) *is unsatisfiable.*

*Proof.* We say that two traces disagree if they receive the same argument value but return different values. We say that a pair of feasible executions (π1, π2) is an *impurity witness* if there is a trace <sup>π</sup>a in <sup>π</sup><sup>1</sup> and a trace <sup>π</sup>b in <sup>π</sup><sup>2</sup> such that <sup>π</sup>a and <sup>π</sup>b disagree.

A trace is said to be compatible with a function f (and vice versa) if the trace's input-output behavior matches that of the function. An execution is said to be compatible with a function (and vice versa) if every trace in the execution is compatible with the function. We say that a feasible execution π *strongly satisfies* ϕinv if for every function f that is compatible with π, (π, f) |= ϕinv.

We prove the theorem using the following lemmas: if iw(p, ϕinv) is unsatisfiable, then Lemmas 2 and 3 imply that the preconditions of Lemma 1 hold and, hence, p satisfies pure(ϕinv).


3. If an impurity witness exists, then iw(p, ϕinv) is satisfiable.

1 is straightforward.

For 2, we use a "minimal" feasible execution π that does not strongly satisfy ϕinv to construct a satisfying assignment to ¬ϕvc.

For 3, we use a "minimal" impurity witness to construct a satisfying assignment to (ϕpost α <sup>∧</sup> <sup>ϕ</sup>post β <sup>∧</sup> (n<sup>α</sup> <sup>=</sup> <sup>n</sup>β) <sup>∧</sup> (r<sup>α</sup> <sup>=</sup> <sup>r</sup>β)).

Please see [4] for more details of the proof.

## **5 Generating the Invariant**

We now describe a simple but reasonably effective semi-algorithm for generating a candidate invariant automatically from the given procedure. Our approach of Sect. 4 can be used with a manually provided invariant or the candidate invariant generated by this semi-algorithm (whenever it terminates).

The invariant-generation approach is iterative and computes a sequence of progressively weaker candidate invariants I0, I1, ··· and terminates if and when <sup>I</sup>m <sup>≡</sup> <sup>I</sup>m+1, at which point <sup>I</sup>m is returned as the candidate invariant. The initial candidate invariant I<sup>0</sup> captures the initial values of the global variable. In iteration k, we apply a procedure similar to the one described in Sect. 4 and compute the strongest conditions that hold true at every program point if the execution of the procedure starts in a state satisfying <sup>I</sup>k−<sup>1</sup> and if every recursive invocation terminates in a state satisfying <sup>I</sup>k−<sup>1</sup>. We then take the disjunction of the conditions computed at the points before the recursive call-sites and at the end of the procedure, and existentially quantify all local variables. We refer to the resulting formula as Next(Ik−<sup>1</sup>, TB(p, Ik−<sup>1</sup>)). We take the disjunction of this formula with <sup>I</sup>k−<sup>1</sup> and simplify it to get <sup>I</sup>k.

Figure 5 formalizes this semi-algorithm. Here, we exploit the fact that the assert statements are added precisely at every recursive callsite and end of procedure and these are the places where we take the conditions to be disjuncted.

In our running example, I<sup>0</sup> is 'g = −1∧ lastN = 0'. Applying Next to I<sup>0</sup> yields I<sup>0</sup> itself as the pre-condition at the point just before the recursive call-site, and '(g = −1∧ lastN = 0) ∨ g = lastN \* p(lastN − 1)' (after certain simplifications) as the pre-condition at the end of the procedure. Therefore, I<sup>1</sup> is '(g = −1∧ lastN = 0) ∨ g = lastN \* p(lastN − 1)'. When we apply Next to I1,

```
I0 = init(p)
Ik = Simplify(Ik−1 ∨ Next(Ik−1, TB(p, Ik−1)))
```

```
Next(ϕpre, assert e) = ∃1 ··· mϕpre(where 1, ··· , m are local variables in ϕpre)
Next(ϕpre, S1; S2) = Next(ϕpre, S1) ∨ Next(post(ϕpre, S1), S2)
Next(ϕpre, if e then S1 else S2) = Next(ϕpre ∧ e, S1) ∨ Next(ϕpre ∧ ¬e, S2)
Next(ϕpre, S) = false(for all other S)
```
**Fig. 5.** Iterative computation of invariant.

the computed pre-conditions are I<sup>1</sup> itself at both the program points mentioned above. Therefore, the approach terminates with I<sup>1</sup> as the candidate invariant.

## **6 Evaluation**

We have implemented our OP checking approach as a prototype using the Boogie framework [6], and have evaluated the approach using this implementation on several examples. The objective of this evaluation was primarily a sanity check, to test how our approach does on a set of OP as well as non-OP procedures.

We tried several simple non-OP programs, and our implementation terminated with a "no" answer on all of them. We also tried the approach on several OP procedures: (1) the 'factCache' running example, (2) a version of a factorial procedure that caches all arguments seen so far and their corresponding return values in an array, (3) a version of factorial that caches only the return value for argument value 19 in a scalar variable, (4) a recursive procedure that returns the n*th* Fibonacci number and caches all its arguments and corresponding return values seen so far in an array, and (5) a "matrix chain multiplication" (MCM) procedure. The last example is based on dynamic programming, and hence naturally uses a table to memoize results for sub-problems. Here, observational purity implies that the procedure always returns the same solution for a given sub-problem, whether a hit was found in the table or not. The appendix of a technical report associated with this paper depicts all the procedures mentioned above as created by us directly in Boogie's language, as well as the invariants that we supplied manually (in SMT2 format).

It is notable that the theorem prover was not able to handle the instances generated by the"existential approach" even for simple examples. The "impurity witness" approach, however, terminated on all the examples mentioned above with the correct answer, with the theorem prover taking less than 1 s on each example. Please see [4] for more information about the examples used in our evaluation.

## **7 Related Work**

The previous work that is most closely related to our work is by Barnett et al. [1,2]. Their approach is based on the same notion of observational purity as our approach. Their approach is structurally similar to ours, in terms of needing an invariant, and using an inductive check for both the validity of the invariant as well as the uniqueness of return values for a given argument. However, their approach is based on a more complex notion of invariant than our approach, which relates pairs of global states, and does not use a function symbol to represent recursive calls within the procedure. Hence, their approach does not extend readily to recursive procedures; they in fact state that "there is a circularity it would take a delicate argument, and additional conditions, to avoid unsoundness in this case". Our idea of allowing the function symbol in the invariant to represent the recursive call allows recursive procedures to be checked, and also simplifies the specification of the invariant in many cases.

Cok et al. [7] generalize the work of Barnett et al.'s work, and suggest classifying procedures into categories "pure", "secret", and "query". The "query" procedures are observationally pure. Again, recursive procedures are not addressed.

Naumann [3] proposes a notion of observational purity that is also the same as ours. Their paper gives a rigorous but manual methodology for proving the observational purity of a given procedure. Their methodology is not similar to ours; rather, it is based finding a *weakly pure* procedure that simulates the given procedure as far as externally visible state changes and the return value are concerned. They have no notion of an invariant that uses a function symbol that represents the procedure, and they don't explicitly address the checking of recursive procedures.

There exists a significant body of work on identifying differences between two similar procedures. For instance, differential assertion checking [8] is a representative from this body, and is for checking if two procedures can ever start from the same state but end in different states such that exactly one of the ending states fails a given assertion. Their approach is based on logical reasoning, and accommodates recursive procedures. Our impurity witness approach has some similarity with their approach, because it is based on comparing the given procedure with itself. However, our comparison is stricter, because in our setting, starting with a common argument value but from different global states that are both within the invariant should not cause a difference in the return value. Furthermore, technically our approach is different because we use an invariant that refers to a function symbol that represents the procedure being checked, which is not a feature of their invariants. Partush et al. [9] solve a similar problem as differential assertion checking, but using abstract interpretation instead of logical reasoning.

There is a substantial body of work on checking if a procedure is *pure*, in the sense that it does not modify any objects that existed before the procedure was invoked, and does not modify any global variables. S˘alcianu et al. [10] describe a static analysis to check purity and Madhavan et al. [11] present an abstractinterpretation based generalization of this analysis. Various tools exist, such as JML [12] and Spec# [13], that use logical techniques based on annotations to prove procedures as pure. Purity is a more restrictive notion than observational purity; procedures such as our 'factCache' example are observationally pure, but not pure because they use as well as update state that persists between calls to the procedure.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Software Evolution and Requirements Engineering

# **Structural and Nominal Cross-Language Clone Detection**

Lawton Nichols(B) , Mehmet Emre, and Ben Hardekopf

> University of California, Santa Barbara, USA *{*lawtonnichols,emre,benh*}*@cs.ucsb.edu

**Abstract.** In this paper we address the challenge of cross-language clone detection. Due to the rise of cross-language libraries and applications (e.g., apps written for both Android and iPhone), it has become common for code fragments in one language to be ported over into another language in an extension of the usual "copy and paste" coding methodology. As with single-language clones, it is important to be able to detect these cross-language clones. However there are many real-world crosslanguage clones that existing techniques cannot detect.

We describe the first general, cross-language algorithm that combines both structural and nominal similarity to find syntactic clones, thereby enabling more complete clone detection than any existing technique. This algorithm also performs comparably to the state of the art in singlelanguage clone detection when applied to single-language source code; thus it generalizes the state of the art in clone detection to detect both single- and cross-language clones using one technique.

## **1 Introduction**

The clone detection problem has long been recognized by the community, with many existing papers exploring different techniques for finding clones amongst code written in a single language [5,13,14,21,22]. However, in recent years an interesting twist has arisen due to the rising popularity of cross-language libraries and applications: *cross-language clones*. Consider the parser generator ANTLR [3], which has runtimes that are written in C#, C++, Go, Java, JavaScript, Python (2 and 3), and Swift. Also consider multi-platform mobile applications, which are often ported between Java and Objective-C or Swift, the languages used by Android and iPhone applications. In these kinds of settings, clones can actually cross language boundaries: a fragment of code in one language can be copied and massaged to conform to the syntax and semantics of another language. Existing single-language clone detection techniques are unable to effectively detect these sorts of cross-language clones. In this paper we propose a method to detect cross-language clones and demonstrate that it (1) finds cross-language clones that no existing method can detect; and (2) performs comparably to existing single-language clone detectors for finding clones within a corpus of single-language code sources. Therefore, our technique generalizes

```
Trees._findAllNodes = function(t, index, findTokens, nodes) { // check this node (the root) first
                       if(findTokens && (t instanceof TerminalNode)) { if(t.symbol.type===index) {
                                              nodes.push(t);
                       } } else if(!findTokens && (t instanceof ParserRuleContext)) { if(t.ruleIndex===index) { nodes.push(t);
                       } }
                       // check children
                       for(var i=0;i<t.getChildCount();i++) {
                                   Trees._findAllNodes(t.getChild(i), index, findTokens, nodes);
            } };
template<typename T>
static void _findAllNodes(ParseTree *t, size_t index, bool findTokens, std::vector<T> &nodes) { // check this node (the root) first
  if (findTokens && is<TerminalNode *>(t)) { TerminalNode *tnode = dynamic_cast<TerminalNode *>(t);
     if (tnode->getSymbol()->getType() == index) {
        nodes.push_back(t);
  } } else if (!findTokens && is<ParserRuleContext *>(t)) { ParserRuleContext *ctx = dynamic_cast<ParserRuleContext *>(t);
     if (ctx->getRuleIndex() == index) {
        nodes.push_back(t);
  } }
  // check children
for (size_t i = 0; i < t->children.size(); i++) { _findAllNodes(t->children[i], index, findTokens, nodes); } }
```
**Fig. 1.** A JavaScript (top) and C++ (bottom) clone pair doing a pre-order search.

**Fig. 2.** A JavaScript (left) and Java (right) clone pair setting the weight and inverse weight of a particle in a graphics application. A bug-fix has been applied to the JavaScript clone but not the Java clone.

the current state of the art in clone detection by extending it to allow for both single-language and cross-language clone detection using a single technique.

To make this problem more concrete, consider Fig. 1, which shows a real-life case (found during our evaluation described in Sect. 6) of code clones involving C++ and JavaScript source code from the ANTLR parser generator [3]. To demonstrate the importance of finding cross-language clones, consider Fig. 2, which shows another real-life case (also found during our evaluation) of code clones involving JavaScript and Java in which a bug-fix has been applied to one of the clones but not the other. In addition, a quick search of the CVE (Common Vulnerabilities and Exposures) database yields a vulnerability due to incorrect message authentication checking that exists in multiple different language implementations of the relevant code [9].

There are only four existing papers that we are aware of that introduce new techniques for cross-language clone detection (discussed in more detail in Sect. 2). That initial work has either focused on clones across languages that share a common intermediate representation such as .NET [1,15] or has deviated from classical clone detection and taken a more restricted, natural languagebased approach, sometimes relying on assumptions that may not be met in real code [7,8]. None of that existing work would detect the clone examples given in Figs. 1 and 2 without extensive modification.

The main reason for these restrictions in previous work is that the *syntactic structure* (i.e., parse trees) of different languages can be extremely different even for code that, at the source level, seems similar. We demonstrate this phenomenon later in this paper. In order to overcome this problem, previous work has either restricted itself to languages with a common intermediate representation (thus enforcing that the syntactic structure is similar for similar code) or abandoned structural matching entirely and looked only at the names of variables and other user-defined abstractions (what we call *nominal* clone detection). We observe that using purely structural or purely nominal matching is sub-optimal in a cross-language setting, in that each can yield both false positives and false negatives.

Our technique consists of (1) a method for enabling structural matching for cross-language clones even in those cases where syntactic structure is different (Sect. 4); and (2) a method for composing both structural and nominal matching into a singular matcher, maintaining the strengths of each while mitigating their individual weaknesses (Sect. 5). We have implemented our technique in a tool called Fett<sup>1</sup> that works at the granularity of function pairs; we use Fett to empirically compare our proposed technique against existing techniques (Sect. 6). We begin by describing related work and background information in Sect. 2 and giving a high-level overview of our technique in Sect. 3.

## **2 Background and Related Work**

The concept of clone detection is not new, and the different techniques involved have been surveyed extensively [5,21]. Most existing non-semantics-based techniques can be categorized into the classes of "structural," "nominal," or "hybrid," which we define below.

Before we begin, there is a bit of misleading terminology in the literature: there exist many clone detection tools that are considered language-generic or language-agnostic (e.g., [22]), but can only be configured to work for programs written in a single language at a time. CCFinder [14], for example, can detect clones for six different programming languages; however, the user cannot (outside of naive text-only modes) truly cross language boundaries during a "languagegeneric" clone detection phase.

#### **2.1 What Exactly Is a Cross-Language Clone?**

Intuitively, we consider a cross-language clone to be the same as any samelanguage clone—two pieces of code that implement similar functionality—the only difference is the setting. We highlight here what kinds of clones our tool is able to find, and what kinds of clones we include in our evaluation based on their classification (i.e., Type I, II, III or IV [24]).

<sup>1</sup> Our implementation is located at http://www.cs.ucsb.edu/∼pllab under the "Downloads" link.

The usual code clone hierarchy does not translate well to a cross-language setting: type I and type II clones [24] may not exist across languages because of syntactic differences between languages (e.g., switch statements exist in C but not in Python). In this paper, we present methods that discover syntactic clones modulo the differences in language syntax, and we do this by creating a correspondence between related but different constructs. We do not consider semantic (type IV) clones that implement the same functionality in a different way (e.g., quicksort vs. selection sort). Readers familiar with the standard clone hierarchy can think of the clones that we find as type III clones generalized across languages.

## **2.2 Structural Program Similarity**

Intuitively, two programs (or subprograms) can be considered similar if they look the same, disregarding identifier names—i.e., if their syntax trees have roughly the same shape. We refer to structural clone detection as the process of taking advantage of this similarity.

Same-language clone detection tools usually also consider identifier data, and we are not aware of any purely structural cross-language clone detector. A notable same-language tool that operates via structural similarity is Deckard, which converts syntax trees into vectors for fast comparison [13].

Structural similarity is useful in all settings, but it is a hard problem in a multi-language setting—all the hybrid structural/nominal methods we describe below make some restriction on the languages involved. A major part of the novelty of our technique is a method for purely structural matching across languages (though the final algorithm then combines structural with nominal (i.e., identifier-based) techniques for greater accuracy).

## **2.3 Nominal Program Similarity**

Whereas structural similarity disregards identifiers and instead looks at code shape, nominal similarity does the exact opposite. Nominal similarity relies on the insight that similar code, especially copied and pasted snippets, will have the same identifier names throughout, regardless of code structure.

Notable same-language clone detection tools that operate via nominal similarity are CCFinder and SourcererCC, which compare program tokens [14,25].

*Across Languages.* Cheng et al. describe CLCMiner [8], the first crosslanguage clone detection tool that does not require the languages involved to translate to the same intermediate form. It compares revision histories (diffs) in repository logs for cross-platform C# and Java programs; the tokens inside commits are used to compute similarity scores. CLCMiner is the basis for the Nominal algorithm defined in Sect. 5.1.

Cheng et al. study a different notion of nominal similarity in [7], where they measure the effectiveness of token distributions in finding clones among crossplatform mobile applications; they obtain a negative result for identifier names alone. Flores et al. [10] use natural language processing techniques to discover cross language clones at the function level.

#### **2.4 Hybrid Program Similarity**

It is logical to combine structural and nominal similarity methods, as the results they provide are complementary. A notable same-language, hybrid clone detection tool is NiCad, which performs its comparisons at the parse tree level [23]. Syntax tree-based comparison is quite common [4,27].

Tree similarity is computationally expensive [6], and it is more efficient to linearize programs in some way; sequence similarity algorithms can then do the comparison. Existing same-language work compares the tokens in the order in which they appear in the parse tree [11], and we also take advantage of linearization of full parse trees in this work.

*Across Languages.* Kraft et al. present C2D2 [15], the first cross-language clone detection tool, for C# and Visual Basic programs. This work requires that the languages involved be compiled to the same intermediate representation (IR)—.NET IR in this case. From a graph derived from that IR, they create sequences of tokens for subgraphs and use a Levenshtein distance-based token similarity algorithm to compare them.

Al-Omari et al. build on Kraft et al.'s work and find clones by comparing CIL intermediate code text [1]. Again, they are restricted to .NET languages.

*This work.* Our method is a hybrid method, works on any language with a grammar definition, and relies on just the source code (in contrast to, e.g., CLCMiner which requires the existence of revision history). We linearize preprocessed parse trees at the function level and compare the linearized sequences in a novel way that generalizes Kraft et al.'s work and incorporates features of Cheng et al.'s work.

#### **2.5 CLCMiner**

Our main comparison is with the only tool designed for cross-language clone detection and capable of handling arbitrary languages: CLCMiner [8]. We provide further background on it here. CLCMiner is based on having the source code in a version control system, and requires a revision history by design. Section 5.1 gives a detailed explanation of our adaptation of CLCMiner. The original CLCMiner algorithm works on diffs and lexes them, whereas our version works on function parse trees.

We were not able to obtain access to the original CLCMiner source code from the authors. In order to compare against this method, we implement our own version which adapts CLCMiner to work with the entire text of a function and have it calculate the distance metric above when given a function pair. Our new implementation may perform better or worse than the original (which uses revision history rather than function pairs) in certain cases.

We incorporate CLCMiner's distance metric in a novel way in Fett, and show that our combination of structural and nominal information produces better results. As we have adapted CLCMiner's algorithm to work on functions instead of diffs, it relies on having a parser to extract the functions and does not rely on a version control system. We refer to our nominal-only adaptation of CLCMiner's algorithm as "Nominal" for the rest of the paper.

## **3 Overview**

In this section we provide a high-level overview of Fett and provide justification for some of our steps. We give an end-to-end example of our clone detection process in our tech report [18]. Fett's pipeline is:


The following sections fill in the details of the structural and nominal aspects of Fett's cross-language clone detection process.

## **4 Structural Clone Detection**

One key insight of our structural algorithm is that *abstract* syntax trees (ASTs), which eliminate details in the concrete parse trees about how exactly the input was parsed or what language it came from, tend to look more similar for similar code even across languages. Unfortunately, ASTs are not part of a language's specification, and AST grammars and formats are implementation dependent. We are not aware of any single compiler that has frontends for the variety of languages that we compare. Our structural clone detection algorithm processes *reduced parse trees* (Sect. 4.1) to eliminate nonessential details about parsing and obtain a structure similar to ASTs.

Another source of disparity between trees generated by two grammars is that the nonterminals are different. The other key insight of our structural algorithm is that abstracting reduced parse trees by putting nonterminals in *equivalence classes* (Sect. 4.2) strikes a balance between preserving necessary information and smoothing out differences across languages.

Our structural algorithm proceeds by extracting functions from an abstracted parse tree and then computes similarity scores between functions using the Smith-Waterman local sequence alignment algorithm.

Flattening a tree using a preorder traversal helps smooth out most remaining inconsistencies between inter-language reduced parse trees. To demonstrate the dissimilarities due to grammatical differences that preorder traversal removes, see Fig. 3: a grammar that uses nested if statements will have a parse tree like Fig. 3b, while a grammar that uses unnested if statements will look more like Fig. 3c. As the else if cases become more numerous in the first grammar the nesting becomes more severe, emphasizing the differences in the resulting parse trees.

```
if ( exp ) block [else block] (G1)
if exp : block [elif exp : block]* [else block] (G2)
```
(a) Two different kinds of grammars for if statements.

(b) An example parse tree using the nested if grammar (G1).

(c) An example parse tree using the unnested if grammar (G2).

**Fig. 3.** Grammars and parse trees for nested vs. unnested if statements.

#### **4.1 Precedence Woes**

Some grammar definitions encode operator precedence into the grammar<sup>2</sup>, whereas others use facilities provided by the parser generators to encode the precedence. Direct encoding of precedence causes spurious chains of nonterminals in the resulting parse tree, which would be removed when the parse tree is converted to an AST. We collapse the chains of nonterminals encountered in a parse tree for the direct encoding case to remove the chains and mitigate this disparity between different styles of grammars. Figure 4 demonstrates the kinds of issues that are apparent when a grammar hard-codes precedence—because precedence in this case appears in the form of nested productions, we always see "AdditiveExpression" even when there is only a multiplication expression present; this will throw off any clone detector that is working directly on plain parse trees.

If precedence is handled indirectly through the parser generator, then the resulting parse tree is much closer to an AST. This is an example of an issue that only arises in a cross-language setting, and which makes cross-language clone detection strictly more difficult than same-language clone detection. We condense any chains of nonterminals, and we refer to the parse trees after this stage as *reduced parse trees*.

<sup>2</sup> We encountered this only in the C++ grammar during our evaluation.

**Fig. 4.** A subtree of the original C++ parse tree for the text "5\*7".

#### **4.2 Abstracting Parse Tree Nonterminals**

Consider the two reduced parse trees for the expression binarySearch(array, mid+1, high, x) in Figs. 5a and b. Although they look similar to the naked eye, because the node names are different, even a tree edit distance algorithm would say that the trees are not similar at all. We thus need to abstract the nonterminal names while preserving essential information about the tree structure. After performing this abstraction, we call the resulting parse trees *abstracted parse trees*.

(a) Reduced parse tree from a Java parser . (b) Reduced parse tree from a JavaScript parser . (c) Abstraction of the trees in Figures 5a and 5b .

**Fig. 5.** Reduced parse trees for expression binarySearch(array, mid+1, high, x) in Java and JavaScript, and their abstraction. The terminals are omitted for simplicity.

Our method instead groups node types with similar meanings across languages, so that node types that "mean" similar things are in the same group. To do this, we *manually* categorize node types into equivalence classes *once per pair of languages*. For example, consider the equivalence classes <sup>c</sup><sup>1</sup> <sup>=</sup> {FunctionCall, ArgumentsExpression}, <sup>c</sup><sup>2</sup> <sup>=</sup> {Primary, IdentifierExpression}, <sup>c</sup><sup>3</sup> <sup>=</sup> {ArgumentList, ExpressionList}, <sup>c</sup><sup>4</sup> <sup>=</sup> {NumericLiteral, Literal}, <sup>c</sup><sup>5</sup> <sup>=</sup> {AdditiveExpression} and the set C <sup>=</sup> {c<sup>1</sup>, c<sup>2</sup>, c<sup>3</sup>, c<sup>4</sup>, c<sup>5</sup>}. After replacing each node in Figs. 5a and b with its equivalence class in C, we end up with trees that are exactly the same (Fig. 5c). In this specific example the abstracted trees are the same, though this is not always the case in practice.

We define the abstraction algorithm in two parts: EqClassMapOf(C) produces a map from each node to a symbol corresponding to its equivalence class. Abstract(*tree*, *map*) does the abstraction by traversing the given tree bottom up and applying the map. It removes the *nonterminals* which do not belong to any equivalence class. When the abstraction algorithm removes a node, it connects any children of the removed node to the removed node's parent.

#### **4.3 Sequence Alignment for Clone Detection**

Linearizing the trees via a preorder traversal of the nodes will remove most traces of the structural differences demonstrated in Fig. 3. Moreover, the state of the art tree edit distance algorithms are not as scalable as sequence alignment algorithms<sup>3</sup>. These observations led us to explore sequence alignment algorithms as an alternative to tree-edit distance. Levenshtein distance is a popular choice in this category. Smith-Waterman is strictly more general than Levenshtein distance, and it supports assigning weights to different elements in the sequence. Hence, we use the Smith-Waterman algorithm on preordered trees to compute similarity scores. We evaluate the precision and recall of both Smith-Waterman and tree edit distance in Sect. 6 and observe that sequence alignment performs better in terms of precision and scalability.

We convert function subtrees to sequences by computing the preorder traversal. Finally, we execute Smith-Waterman using custom weights on each sequence pair and normalize the resulting score using the normalization factor Z described below. We chose the weights based on the hypothesis that certain nodes like conditionals indicate important program structure, and should generally appear in the same order in a cloned pair of functions; therefore, we assign higher weights to penalize the function pairs in which this alignment does not occur. In the algorithm, the function SmithWaterman(a, b, M, g) computes a similarity score between two sequences a and b using the Smith-Waterman algorithm with substitution matrix M and linear gap penalty coefficient g; a detailed explanation of these parameters can be found in [2].

*Normalizing Smith-Waterman results.* The result of the Smith-Waterman algorithm depends on the size of the input, and longer sequence pairs have higher scores. In order to find both short and long clones, we normalize the resulting similarity score from the Smith-Waterman algorithm to neutralize the bias towards longer clones.

We define the *self-similarity score* of a sequence a as the score assigned to the pair (a, a) by the *unnormalized* Smith-Waterman algorithm; denote this score <sup>S</sup>(a). We normalize score assigned to a pair (a, b) by <sup>1</sup> *<sup>Z</sup>* where Z = max {S(a), <sup>S</sup>(b)}. Note that <sup>Z</sup> is an upper bound for the score obtained by Smith-Waterman, and the score is equal to Z if and only if a <sup>=</sup> b. Thus,

<sup>3</sup> APTED, the state of the art tree edit distance algorithm has a time complexity of O(n<sup>3</sup>) [20] whereas the variant of Smith-Waterman algorithm we use is O(n<sup>2</sup>) [2].

using the normalization factor <sup>1</sup> *<sup>Z</sup>* is useful if one is looking for similar whole functions rather than looking for a small snippet in a larger piece of code.

## **5 Hybrid Algorithm**

Combining nominal and structural clone detection in a cross-language setting provides the best of both worlds, and mitigates any issues that running just one detection method might have.

Identifier names carry some meaning about the programmer intent and give a code snippet context. On the other hand, structure of code (conditionals, loops, function calls etc.) also carry information about programmer intent. Without this structural information, we might misidentify two pieces of code as clones. Our hybrid algorithm is guided by structural information while consulting the Nominal algorithm to use local context within structurally similar pieces of code.

#### **5.1 Our Nominal Algorithm**

We have adapted CLCMiner's algorithm to work on functions as our purely Nominal algorithm. For a given pair of functions (f1, f<sup>2</sup>), our nominal matching algorithm consists of two parts.

The first part takes a function f, removes the comments and splits the tokens on each non-letter character (such as underscores or dashes). It then splits the camel case tokens into words and converts them to lowercase—each function becomes a bag of words that is represented by a characteristic vector, which holds the number of occurrences of each word. We denote the resulting characteristic vector as <sup>v</sup>(f).

The second part of the algorithm computes a normalized distance between the two characteristic vectors <sup>v</sup>1, v<sup>2</sup> according to the formula d(v1, v<sup>2</sup>) = *v*1−*v*2<sup>1</sup> *v*11+*v*2<sup>1</sup> where -·-<sup>1</sup> is the -<sup>1</sup> norm (i.e., the sum of the absolute values of every entry in the vector). This algorithm computes a distance between two given functions; to make it comparable to the other algorithms, we use 1 <sup>−</sup> d(v<sup>1</sup>, v<sup>2</sup>) as a similarity score.

## **5.2 Full Algorithm**

Our full algorithm is provided in our tech report [18]. It is a combination of the structural and nominal algorithms: we linearize the parse trees, and consecutive terminal nodes become bags of words. Nonterminals are compared using our structural method, and bags of words are compared using our nominal method.

## **6 Evaluation**

In this section we compare our work against existing work on both cross-language and same-language clone detection.

#### **6.1 Implementation and Environment**

We have implemented our tool Fett in Scala and used the ANTLR parser framework as its front end, so that any language with an ANTLR grammar can be easily connected.

To test whether Fett can handle same-language clone detection with similar accuracy as specialized, language-specific tools, we configured NiCad 4.0 [23] to work at the function-level granularity and experimented with configurations until we found the best-performing one for our tests<sup>4</sup>.

Because we are comparing parse *trees*, we also want to determine how well we compete against the state-of-the-art tree edit distance algorithms, thus we compare one data set with APTED [19,20]. We normalize the similarities using the method described in [17], and, as this normalization method requires a metric distance, we could not introduce weights for matches. We can still weight mismatches, though. We found that the parameters *mismatch* = 1, *deletion* = *insertion* = 5, *match* = 0 gave us the best results overall.

We chose the threshold for ignored functions (defined in Sect. 4.3) to be θ = 35 for every experiment, and the exact tolerance parameters are given below for each case. We used the same set of equivalence classes with the same weights for all cases: conditional, loop, return, and function call were all weighted 5; assignments were weighted 2; and all other considered nodes were weighted 1.

Our experiments were run on a computer with an Intel i7 4790 3.6 GHz processor. Fett, Structural, Tree Edit Distance, and Nominal were given 8 GB maximum heap size and were set to use 4 threads.

#### **6.2 Methodology**

We used the standard statistical metrics of precision, recall, and F-measure to quantitatively assess the effectiveness of our different techniques.

Due to the sheer amount of possible clone candidates in large projects, it is difficult to manually obtain complete ground truth for clones in real-world programs. Hence, we created two separate data sets for evaluation:

*Manual programs set (handwritten set).* We implemented a set of small programs in different languages to create a setting in which we have complete knowledge of whether a pair of functions are clones. Statistics about the code are in Table 1.

*Randomly sampled program set (large set).* We chose four libraries that have implementations in different languages and set the tolerance parameters<sup>5</sup> defined in our algorithm (see [18]) to give the best results on a per-language

<sup>4</sup> NiCad: threshold = 0.5, minsize = 4, maxsize = 2500, rename = blind, filter = none, abstract = none, normalize = none.

<sup>5</sup> For Fett: μ = 6 (match coefficient) and g <sup>=</sup> *<sup>−</sup>*4 (gap penalty) for the case of comparing Java and JavaScript, and (μ, g) = (9, *<sup>−</sup>*1) for Java/C++ and JavaScript/C++, and (8, *<sup>−</sup>*3) for Java/Java. The nominal multiplier was set to 2 for all but the Java/C++ and JavaScript/C++ cases, where it was set to 3. For the Structural algorithm: (7, *<sup>−</sup>*1) for JavaScript/Java, (8, *<sup>−</sup>*4) for Java/C++, (0.5, *<sup>−</sup>*2) for Java/Java, and (9, *<sup>−</sup>*4) for JavaScript/C++.


**Table 1.** Statistics of handwritten clones.

pair basis. We randomly sampled functions from the files with the same names (ignoring extensions) and manually checked the pairs to create a sample with ground truth—this is essentially the sampling strategy used by Cheng et al. [8] applied to functions instead of diffs. We chose to reuse this sampling strategy due to the manual nature of our evaluation, and because we only possess finite human resources; it does not reflect the true distribution of clones, as function clone pairs are unlikely to be chosen in a standard uniform random sample had we gone that route, our precision and recall scores would not have been meaningful. We are not aware of a better solution to this problem.

The first three libraries considered for this set are: the ANTLR parser framework, version 4 [3]; the toxiclibs computational design library [26]; and the ZXing barcode image processing library [28]. We also considered two ports of the LAME MP3 encoding library in different languages that were ported by different developers to assess the efficacy of clone detection tools in such a scenario: lamejs, a JavaScript port [16]; and java-lame, a Java port [12]. Statistics about the libraries are in Table 2.

**Table 2.** Statistics of libraries considered for evaluation. LoC: non-blank non-comment lines of code, Fun's: # of functions found in each project, Nont'l (Nontrivial) Fun's: # of functions whose reduced parse trees are > θ (the chosen threshold), Pairs: the # of possible fun. pairs, Same-File Pairs: # of pairs of functions coming from files with the same name (ignoring extensions), Sel'd: # of selected pairs, Runtime: total time (H:M:S) to run our method.


#### **6.3 Results**

For our main set of tests, we compare Fett against (1) our purely Structural algorithm (i.e., no token similarity), and (2) our Nominal algorithm. We also apply the APTED tree edit distance algorithm combined with our abstraction method on our handwritten data set; tree edit distance takes at least an order of magnitude longer than the other tools, and we did not evaluate the large data set using tree edit distance because of this and due to its poor performance on the handwritten tests. We use NiCad on the Java-Java same-language case of our large data set.

*Cumulative clone ratios.* We look at the graphs of cumulative clone distributions to choose a good cut-off point for each of the three techniques. These graphs were originally used in [8], and they are meant to give an intuition about where a clone detector separates clones from non-clones.

Similarity vs. cumulative clone ratio graphs track the ratio of clones to nonclones as the similarity score varies from 1.0 to 0. For example, at point 0.4 on the similarity axis, we plot the ratio of clones to non-clones of all samples with similarity scores > <sup>0</sup>.4. A successful clone detector would have a similarity value at which there is a significant drop in this ratio, and that would create the optimal cutoff point. A clone detector may not assign very high scores to any pairs based on its similarity metric; in such cases, we start the plot from the first nonempty bin. Figure 7 shows the cumulative clone ratios for antlrj and toxic; graphs of other test cases are omitted because of space constraints, but they are of similar overall shape. We chose a cutoff point for each clone detector based on the drops from these graphs (e.g. we chose the cutoff point of 0.4 for Fett's Java/Java case). The relative shape of the graph is more important than absolute scores—squishing or stretching the similarity scores only affects the choice of the optimal cutoff point.

*Handwritten test set.* When evaluating the manually created (handwritten) data set, we used the same parameters μ = 7, g <sup>=</sup> <sup>−</sup>2 overall for all pairs of functions in the data set and considered the combined results for both Fett and the Structural algorithm. Fett had its nominal multiplier set to 2. Figure 6 shows the clone distributions of different clone detection methods for the handwritten program set; and precision, recall, and F-measure (harmonic mean of precision and recall) for this set are given in Table 3. Fett and the Structural algorithm had a cutoff of 0.5, and the Nominal algorithm's cutoff was 0.6.

*Handwritten test set discussion.* The table and the figures paint a similar picture. Both Fett and the Structural algorithm seem to perform the best on this data set—the graphs for the higher similarity scores have a high clone ratio, and there is a sharp decline visible in both graphs as the similarity score is allowed to lower. The Nominal algorithm has a less sharp drop, and this indicates that it is assigning mid-range similarity scores with low precision. It is also notable that tree edit distance does so poorly; we believe that this is because we are not allowed to give weights to matches, as described above.

**Fig. 6.** Cumulative clone ratio distribution for handwritten programs. Results of Fett and structural coincide.

**Table 3.** Precision, recall, and F-measure for handwritten program set.


*Large test set.* We now present and discuss all the cross-language results for our large test set. The same-language case is different from the cross-language cases, so the reader is asked to consult Fig. 7b, which is indicative of all the cross-language cases, and not Fig. 7a.

Cutoffs were chosen on a per-language pair basis that maximized a given tool's score. For Fett, for the three JavaScript/Java test cases and the Java/C++ test case, we used a cutoff of 0.4, and the rest used a cutoff of 0.5. For the Structural algorithm, we used a cutoff of 0.6 for JavaScript/Java, 0.5 for Java/C++ and JavaScript/C++, and 0.4 for Java/Java. For the Nominal algorithm, we used a cutoff of 0.5 for JavaScript/C++, and 0.6 for the rest.

Figure <sup>8</sup> shows precision, recall and F-measure of all the tools we compared for each data set and provides a visual and quantitative assessment of efficacy of all the techniques.

*Large test set discussion.* Clone ratios relate most closely to the precision scores for each data set, and from the results it appears that the Structural algorithm generally has the upper hand in this area—applying the intuition described above, we see that the Structural algorithm seems to cut off at the sharpest angle in most cases. It makes sense why this is the case, as pieces of code that look similar across languages are generally prime candidates for clones.

Precision is of course not the whole story. It is clear that Fett is able to take the best of both the nominal and structural worlds, and the F-measure is always the highest. When it comes to Structural's results, the toxiclibs case is an outlier, where we found that there were more cases of the structural differences; Fett's hybrid structural/nominal algorithm was able to make up for this, though.

*Same-language test case.* To assess performance on same-language clones, we compared our tool with NiCad on the Java version of ANTLR. Returning to the same figures, the antlrj case is quite similar to the other language pairs in terms of precision, recall, and F-measure, which demonstrates that our tool is capable of holding its ground in a same-language setting.

Fett performs slightly worse (by one percentage point in terms of Fmeasure) than NiCad. This result is not surprising because NiCad uses more

**Fig. 7.** Similarity vs. cumulative clone ratio for the samples from the large open-source program set.

**Fig. 8.** Precision, recall and F-measure of clone detection tools on the large program set.

information about the code whereas we deliberately discard some information by abstracting parse trees to work in a cross-language setting. Even with our filtering of parse trees, Fett's F-measure score is very close, and this shows that our tool is capable of producing similar results to a dedicated same-language tool.

*Overall results.* We observe that the Fett's hybrid algorithm, in terms of Fmeasure, outperforms both the Nominal algorithm and the Structural algorithm consistently in our large test set experiments.

*Limitations.* Fett may have difficulty scaling to repositories with large numbers of large functions—a run of Fett on the entire toxiclibs library (comparing every function pair, not just same file pairs) takes 5.13 h—and so further improvements will be required to enable such a target. One possible future direction for improvements could be to develop semi-automated solutions where we have the user use her domain knowledge and pick out the files or functions to compare beforehand, or the user can prune the search space by telling the tool which modules are unrelated.

## **7 Conclusion**

We have presented Fett, a hybrid structural/nominal clone detection method that is capable of operating across programming languages and that is generic in the sense that it does not require any languages involved to belong to the same language family. It is syntax-based, uses ready-made grammar specifications, and requires minimal manual effort—the keys to the process are syntax abstraction and sequence alignment. We have provided a two-part evaluation of Fett, and we empirically demonstrate on multiple test sets that Fett is accurate in terms of the standard metrics of precision and recall. We also confirm that our method is on a par with previous work when it comes to same-language clone detection, thus proving that it is strictly more general than single-language methods.

**Acknowledgments.** This work was supported by NSF CCF-1319060.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **SL2SF: Refactoring Simulink to Stateflow**

Stephen Wynn-Williams1(B), Zinovy Diskin<sup>1</sup>, Vera Pantelic<sup>1</sup>, Mark Lawford<sup>1</sup>, Gehan Selim<sup>1</sup>, Curtis Milo<sup>1</sup>, Moustapha Diab<sup>2</sup>, and Feisel Weslati<sup>2</sup>

<sup>1</sup> McMaster Centre for Software Certification, McMaster University, Hamilton, ON, Canada {wynnwisj,diskinz,pantelv,lawford,selimg,milocj}@mcmaster.ca <sup>2</sup> FCA US LLC, Auburn Hills, MI, USA moustapha.diab@external.fcagroup.com, faz.weslati@fcagroup.com

**Abstract.** In the Matlab Simulink environment, systems can be modelled using Simulink block diagrams and Stateflow state charts. While stateful logic is more naturally modelled using Stateflow, in practice complex block diagrams are often used instead, resulting in models that are hard to understand and maintain. In order to improve the maintainability and understandability of large industrial models, this paper presents a strategy for refactoring Simulink block diagrams implementing stateful logic into functionally equivalent Stateflow state charts that more naturally represent the intended behaviour. To bridge the gap between the syntax of block diagrams and state charts, Mealy machines represented by tabular expressions are used as an intermediate representation. The compositional language of block diagrams is used to combine tables modelling individual blocks into a table for the entire block diagram which describes the high level state machine encoded in the Simulink subsystem. A prototype tool that performs the translation from Simulink to Stateflow automatically is discussed.

**Keywords:** Simulink · Stateflow · Refactoring · Mealy machines · Tabular expressions · Monoidal categories

## **1 Introduction**

The adoption of Model-Based Design in the development of embedded control systems across industries has led to the wide use of Matlab/Simulink/Stateflow as a supporting environment. The modelling capabilities provided by Simulink block diagrams and Stateflow state charts complement each other by providing languages for functional and stateful system specifications. Due to their individual strengths, one modelling formalism may be preferable for specifying certain classes of behaviours. For example, the MathWorks Automotive Advisory Board (MAAB) guidelines [25] advise the use of Stateflow over Simulink for modelling stateful logic. This is because Simulink block diagrams that are used to model mode switching logic are often cumbersome and difficult to understand. In this case, Stateflow state charts should be used to implement the same logic resulting in a structure which is easier to read, maintain, and verify.

For example, each model in Fig. 1 executes periodically to update its state and outputs. When the block diagram in Fig. 1a updates, each signal line is given a value and each block uses the values of the incoming signals to determine the values of the outgoing signals. When the state chart in Fig. 1b updates, it checks each condition on transitions leaving its current mode (i.e. state node). If a condition is satisfied, the state chart transitions to the associated target mode and executes the *exit* actions of the mode it is leaving, the actions on the transition it is taking, and the *entry* actions of the mode it is entering. If no transitions are valid, the state chart remains in its current mode and executes the *during* actions of that mode.

**Fig. 1.** Model of a timer in Simulink and Stateflow.

The Simulink and Stateflow models shown in Fig. 1 are functionally equivalent. Both models capture a timer with one boolean input, *start*, and one boolean output, *running*. When *start* becomes true, the system starts counting down from ten to zero. While the system is counting down, *running* is true. Once the counter reaches zero, *running* is set to false and becomes true again if *start* is true. Although there are relatively few blocks in Fig. 1a, it is difficult to understand how this model achieves the behaviour while the state chart in Fig. 1b clearly captures the system's modes and the conditions triggering mode changes.

Our industrial experience has identified the need to refactor Simulink block diagrams to Stateflow state charts for easier comprehension and maintenance. More precisely, practice shows that Simulink is often used to specify stateful logic even though Stateflow would be a more appropriate implementation language. This might occur during model evolution when modes of operation are added to previously mode-free block diagrams, and developers find it easier to modify the existing Simulink logic to accommodate the change than to reproduce the behaviour from scratch in a state chart. Other times, a developer's preference dictates the choice of modelling formalism. Manual refactoring from Simulink to Stateflow, although feasible, is a time consuming and error prone process which requires that the behaviour of complex Simulink models is completely understood.

This paper presents an approach to translate block diagrams into behaviourally equivalent state charts. The approach converts individual blocks into tabular expressions [21] to expose their latent state variables and decision logic. The data flow between blocks is then used to combine tables into a single, larger table describing the entire block diagram. Then, the elements of state charts (states, transitions) are identified by reconfiguring the combined tables into a form similar to state charts. Behavioural equivalence is established by giving semantics to block diagrams, state charts, and the intermediate tables as Mealy machines. The paper's main contributions are: (i) A method for translating Simulink block diagrams to Stateflow state charts via tabular expressions. (ii) A categorical framework for composing Mealy machines by combining their update functions as the basis of the translation. (iii) A prototype tool implementing the translation from Simulink to Stateflow.

This paper is organized as follows. Section 2 describes how we model systems and our categorical framework for combining them. Section 3 illustrates the translation method with a simple example. Section 4 describes the application of the categorical framework to convert block diagrams to tabular expressions. Section 5 explains how tabular expressions are converted to state charts. Section 6 describes the prototype tool. Related work is covered in Sect. 7 and the paper concludes with Sect. 8.

## **2 Background: Modelling Systems and Their Combinations**

This section describes the formalisms underlying the proposed translation approach: Mealy machines, tabular expressions, and monoidal categories.

#### **2.1 Mealy Machines: Modelling Stateful Systems**

To preserve behaviour, the semantics of both block diagrams and state charts are modeled using *Mealy machines*.

**Definition 1.** *A* Mealy Machine m is a tuple (S, s0, Σ, Λ, ud), where S is a set of states (the *state space*), s<sup>0</sup> ∈ S (the *initial state*), Σ is a set of input values (the *input alphabet*), Λ is a set of output values (the *output alphabet*), and ud : Σ × S → Λ × S is a function (the *update function*) which computes the current output and next state from the current input and current state.

For example, the unit delay <sup>1</sup> <sup>z</sup> block labelled counter in Fig. 1a can be modelled as the Mealy machine *delay* = (R, 0, R, R, *shift*). The block has an input variable (port) i, an output variable (port) o, and an internal state variable *counter* , where i, o, *counter* <sup>∈</sup> <sup>R</sup>. When the block updates, it outputs the current state value o = *counter* , and updates the state to store the current input value *counter* = i, i.e. (o, *counter* ) = *shift*(i, *counter* ), where *shift* : <sup>R</sup><sup>2</sup> <sup>→</sup> <sup>R</sup><sup>2</sup> is defined as *shift*(i, *counter* )=(*counter* , i).

While Simulink has no formal semantics, our use of Mealy machines to model their behaviours is consistent with the informal semantics described in Chap. 3 of the Simulink User Guide [26].

#### **2.2 Tabular Expressions: Representing Conditional Behaviours**

Both block diagrams and state charts can specify decision logic, but in rather distinct ways. We unify the presentation of decision logic in the two formalisms using two similar forms of tabular expressions: *horizontal condition tables* (HCTs) as presented in [28]; and *state transition tables* (STTs), which specialize HCTs to describe state charts similarly to the ones presented in [24].


**Fig. 2.** Intermediate representations

An HCT is represented in Fig. 2a. It is a tabular representation of the update function of a Mealy machine which models the block diagram from Fig. 1a. Given the variable values *start* = true and *counter* = 0, the table can be evaluated from left to right in the following way. Since the first condition *start* of the first column is satisfied, and the sub-condition *counter* ≤ 0 in the second row of the second column is satisfied, we use the second row to determine that *running* is given the value of *false*, and *counter* is given a value of 10.

The second tabular representation, STTs, are also used to represent the update function of Mealy machines. Their special format closely matches the state charts they model. For example, the STT in Fig. 2b represents the state chart in Fig. 1b. Each mode is listed in the first column, and the condition of each transition is listed in the second column, adjacent to the mode they leave. The columns after the double bars describe how each output/state variable is updated by the actions of the associated transition. The final column of each row indicates which mode the associated transition leads to.

Tabular expressions were given a precise semantics in [10]. The structure of tables can be rearranged without changing the function they describe, e.g., conditions can be reordered as in [4]; conditions can be combined with subconditions (via conjunction) to flatten the hierarchy of conditions; and normal expressions in the table can be simplified by assuming the conditions to their left hold.

#### **2.3 Categorical Framework: Combining Systems**

The key idea of block diagrams is to combine simple, predefined blocks to describe a behaviour. The language of *monoidal categories* explains how to break down the complex data flow of block diagrams and describe it in terms of simpler data flow [5] (i.e. cascading blocks in sequence, placing blocks in parallel, and feeding outputs of blocks back to their inputs).

Monoidal categories describe data flow in an abstract setting where blocks are called *morphisms*. Simple data flow constructs are described as operations on morphisms, which can be visualized using block diagrams called *string diagrams* [5,22]. In this section, we discuss the wiring constructs in the concrete setting of the category **Set**, where morphisms are functions from an input set of tuples to an output set of tuples (called the *domain*/*codomain objects* of the morphism).

**Fig. 3.** Functional fragment of timer example

A fragment of the block diagram from Fig. 1a can be used to illustrate the idea behind the basic data flow operations. The string diagram in Fig. 3 describes a function that is broken down into sub-functions combined via two operations: sequential combination (denoted ";") and parallel combination (denoted "⊗"). The fragment describes a function <sup>g</sup> from <sup>R</sup>×<sup>B</sup> to <sup>R</sup>. Each wire extending from the left/right of the large compound function indicates an input/output value, respectively. The wire is labelled with the set from which the value comes. If there are multiple wires, the domain or codomain of the function is given as the Cartesian product of those sets. In monoidal categories, the Cartesian product is generalized as an operation called the *monoidal product on objects*.

The function g is composed of a sequence of sub-functions, g = f1; f2; f3; f4. The sub-functions (except for f4) consist of functions composed in parallel with wires and other functions. The wiring "data routing functions" are then defined as follows: a normal wire is the *identity* function id<sup>X</sup> = {(x) → (x)}; wires crossing over each other define the *braiding* function BrA,B = {(a, b) → (b, a)}; and branching wires are called the *diagonal* function Δ<sup>X</sup> = {(x) → (x, x)}. The functions are indexed with the set(s) over which they are defined. Morphisms like these functions have special status in monoidal categories and must satisfy some axioms to verify that they "act like wiring" in the host category.

Sub-function f<sup>3</sup> can now be described as f<sup>3</sup> = add ⊗ id<sup>R</sup> ⊗ idR. Functions combined in parallel have domains/codomains which are the Cartesian products of the domain/codomain of the component functions. The parallel combination uses each component function independently to calculate each component of the output. For example, taking add = {(x1, x2) → (x<sup>1</sup> + x2)}, the function add ⊗ id<sup>R</sup> ⊗ id<sup>R</sup> is given by {(x1, x2, x3, x4) → (x<sup>1</sup> + x2, x3, x4)}. In monoidal categories this operation is generalized as the *monoidal product on morphisms*, where the domain/codomain of a product morphism is given by the monoidal product of the domain/codomain objects of the component morphisms. It is notable that we can also describe sub-function f<sup>3</sup> as f<sup>3</sup> = add ⊗ id<sup>R</sup><sup>2</sup> , where the two wires are treated as one function. This is useful, for example, when describing the sub-function f<sup>2</sup> as f<sup>2</sup> = Br<sup>R</sup>2,<sup>R</sup> ⊗ *sw* <sup>B</sup>.

Describing f<sup>1</sup> requires modelling constant blocks as functions. Therefore, constants are described as functions with inputs from the singleton set 1 = {()}, and we draw functions with domain/codomain <sup>1</sup> as blocks with no wires extending from the left/right side, respectively. Functions modelling constant blocks, [k] = {() → (k)}, always take the empty tuple as input, and always produce the same value k as output. The function f<sup>1</sup> can now be described as <sup>f</sup><sup>1</sup> <sup>=</sup> <sup>Δ</sup><sup>R</sup> <sup>⊗</sup> [−1] <sup>⊗</sup> [10] <sup>⊗</sup> id<sup>B</sup> <sup>⊗</sup> [0]. Objects like <sup>1</sup> have special status in monoidal categories and are called the *monoidal unit*. Taking their monoidal product with any other object X yields the same object X. Intuitively, this means that concatenating any tuple (x1, .., xn) with the empty tuple () does nothing. This explains why the product of the domains of the functions in f<sup>1</sup> is the set <sup>R</sup> <sup>×</sup> <sup>1</sup> <sup>×</sup> <sup>1</sup> <sup>×</sup> <sup>B</sup> <sup>×</sup> <sup>1</sup>, but the domain of <sup>f</sup><sup>1</sup> is described as <sup>R</sup> <sup>×</sup> <sup>B</sup>—the former simplifies to the latter.

We now describe the entire function g in terms of simple data flow:

$$g = (\Delta\_{\mathbf{R}} \otimes [-1] \otimes [10] \otimes \mathrm{id}\_{\mathbf{B}} \otimes [0]); (\mathrm{Br\_{R^2, \mathbf{R}}} \otimes sw\_{\mathbf{B}}); (add \otimes \mathrm{id}\_{\mathbf{R}} \otimes \mathrm{id}\_{\mathbf{R}}); sw\_{\mathbf{R}})$$

However, this example does not contain feedback loops. Loops are obtained when inputs and outputs of a function are connected by some common wire(s), such as the wire connecting the first input and first output of the inner box in Fig. 4a. Adding looping wires to a function f : X × A → X × B yields a new function f <sup>∗</sup> : A → B (e.g., the outer box in Fig. 4a) where f <sup>∗</sup>(a) = b if there exists a unique x ∈ X such that f(x, a)=(x, b). When such an x exists for each a ∈ A, the loop configuration is considered *well-formed*. Following [11], we encode the addition of such loops with a *trace* operation: Tr<sup>X</sup> A,B(f) = f <sup>∗</sup>.

For example, consider the function f = {(x, y) → (x + x, x + y)}. In the function Tr<sup>R</sup> <sup>R</sup>,R(f) the trace applies the constraint that the first input is equal to the first output (i.e. x = x + x) to which there is a unique solution: x = 0. Given any <sup>y</sup> <sup>∈</sup> <sup>R</sup>, <sup>f</sup>(0, y) = (0, y), therefore Tr<sup>R</sup> <sup>R</sup>,R(f) = {(y) → (y)}. This approach uses *fixed point equations* to specify traces, which is generalized by the approach from [8]. Since these fixed point equations are not guaranteed to have a unique solution, the trace operation is *partial*—it is only defined for loop configurations that are well-formed. Partial traces have been described in [15], and the guarded structure introduced in [7] compositionally describes which feedback configurations are valid. For the loops to "act like wiring", certain axioms must be satisfied, e.g., the *yanking* axiom (as shown in Fig. 4b) states that Tr<sup>X</sup> X,X(BrX,X) = id<sup>X</sup> for any set X.

**Fig. 4.** String diagrams for traced categories

## **3 Translation Strategy**

The translation strategy is composed of three steps. This section illustrates these steps by considering the example from Fig. 1.

First, the decision logic implemented by the block diagram is encoded as the HCT in Fig. 8a. This step is described in Sect. 4. In the second step, the representation is simplified as, depending on the value of *counter* , only some rows of the table can be valid. By associating a certain range of state variable values with a mode of operation, we simplify the representation by considering only the conditions which are possible. This allows us to leverage the conditions from HCTs to determine the modes of operation by rearranging HCTs into equivalent STTs such as Fig. 2b. The final step trivially rearranges the information from STTs into a state chart by creating a transition for each row. The conversion from HCTs to STTs to state charts is described in Sect. 5, and possible simplifications to the resulting state chart are discussed.

Even with such a simple example, the importance of automated refactoring becomes apparent. If the model were to be refactored manually, a state chart that is not equivalent to the block diagram could be created unintentionally. For example, one can manually produce a state chart that transitions out of the *Running* mode when *counter* is zero, rather than one.

## **4 Block Diagrams to HCTs: Mealy Composition**

The first step of the translation strategy is to model the entire block diagram as a Mealy machine whose update function is represented as a HCT. To achieve this, Simulink block diagrams are modelled in a category **Mealy**, where morphisms (i.e. blocks) are Mealy machines, not functions. We then show how the update functions of composite Mealy machines built from the operations described in Sect. 2.3 can be built from the update functions of the component Mealy machines using the same operations on functions. Then, the predefined update functions of individual blocks can be represented using HCTs and combined according to the functional combinations derived from the block diagram.

#### **4.1 Mealy Machines and Their Combinations via Functions**

In this section, we consider a category **Mealy** whose objects are sets, and whose morphisms m : Σ → Λ are Mealy machines with input alphabet Σ, and output alphabet Λ. Composition of morphisms is given by the usual definition of cascade composition of Mealy machines [13]. We also introduce a monoidal product, giving the category a monoidal structure. It is defined on objects as the Cartesian product of sets, and on morphisms as the parallel composition of Mealy machines. The unit of the monoidal product is the same as for sets, the set containing one element: 1. Considering equality of morphisms up to bisimilarity results in a structure similar to the one used in [9] to describe symmetric lenses—according to [9], this structure forms a (symmetric) monoidal category.

While the cascade/parallel composition of Mealy machines is well understood (see, e.g. [13]), we introduce a definition for the update functions of the composed machines which wires together the update functions of the individual machines. Because string diagrams are used to represent both Mealy machines and their update functions, let us introduce some graphical notation to differentiate them. For Mealy machines, the string diagrams use black boxes to denote component Mealy machines (e.g. Fig. 5a). The update function *ud* of a Mealy machine m can be expressed using the projection mapping mud = *ud*. For update functions, the string diagram is decorated with grey backing to group the inputs/outputs of the update function into two main components: the upper components describe the inputs/outputs to the Mealy machine, and the lower components describe the current/next state (e.g. Fig. 5d).

**Fig. 5.** Composite Mealy machines and their update functions

Two Mealy machines m<sup>1</sup> = (S1, s<sup>1</sup> <sup>0</sup>, Σ, Θ, ud1), and m<sup>2</sup> = (S2, s<sup>2</sup> <sup>0</sup>, Θ, Λ, ud2) can be composed in sequence as illustrated by Fig. 5a to form the composite Mealy machine <sup>m</sup>1; <sup>m</sup><sup>2</sup> = (S<sup>1</sup> <sup>×</sup> <sup>S</sup>2,(s<sup>1</sup> 0, s<sup>2</sup> 0), Σ, Λ, ud ). The update function ud for m1; m<sup>2</sup> with the string diagram in Fig. 5d, is defined as:

$$[[m\_1; m\_2]]\_{ud} = ([[m\_1]]\_{ud} \otimes id\_{S\_2}); (id\_{\Theta} \otimes \text{Br}\_{S\_1, S\_2}); ([[m\_2]]\_{ud} \otimes id\_{S\_1}); (id\_{\Lambda} \otimes \text{Br}\_{S\_2, S\_1})$$

The parallel composition of m<sup>1</sup> and m<sup>2</sup> is the Mealy machine m<sup>1</sup> ⊗ m<sup>2</sup> = (S<sup>1</sup> <sup>×</sup> <sup>S</sup>2,(s<sup>1</sup> 0, s<sup>2</sup> <sup>0</sup>), Σ<sup>1</sup> × Σ2, Λ<sup>1</sup> × Λ2, ud ) as illustrated by Fig. 5b. The update function ud for m<sup>1</sup> ⊗ m2, with string diagram Fig. 5e, is defined as:

m<sup>1</sup> ⊗m2ud = (id<sup>Σ</sup><sup>1</sup> ⊗Br<sup>Σ</sup>2,S<sup>1</sup> ⊗id<sup>S</sup><sup>2</sup> ); (m1ud ⊗m2ud); (id<sup>Λ</sup><sup>1</sup> ⊗Br<sup>S</sup>1,Λ<sup>2</sup> ⊗id<sup>S</sup>2)

Feedback configurations of Mealy machines (e.g., Fig. 5c) can be defined with fixed-point equations, such as in [13]. We give an equivalent description in terms of the trace operation in **Set**. A Mealy machine m = (S, s0, Θ×Σ,Θ×Λ, ud) can be traced to form the machine Tr<sup>Θ</sup> Σ,Λ(m)=(S, s0, Σ, Λ, ud ) where the update function *ud* is defined as -Tr<sup>Θ</sup> Σ,Λ(m)ud = Tr<sup>Θ</sup> <sup>Σ</sup>×S,Λ×<sup>S</sup>(mud) as illustrated by Fig. 5f. Since this operation is defined in terms of traces in **Set**, many of the properties of traces can be derived from traces in **Set**.

The above results mean that if we know the update functions of individual Simulink blocks, then we can model the update functions of block diagrams which configure those blocks in sequence, in parallel, and with feedback.

## **4.2 Functional Embedding and Wiring Morphisms**

In this section, we address the fact that a large part of a Simulink block diagram *looks* very functional (i.e. stateless). For example, many of the blocks and wiring in Fig. 1a can be modelled as functions. For this reason, we consider a class of Mealy machines which produce outputs as a function of only their current inputs. Any function f : X → Y can be described as the Mealy machine Mf = (1,(), X, Y, f), with one state, and update function f (see Fig. 6a). The mapping M *embeds* morphisms from **Set** into the category **Mealy**, because any two embedded functions Mf and Mg interact in **Mealy** very similarly to the way they interact as functions in **Set**.

**Fig. 6.** Embedded functions and their interactions

This explains how functional aspects of Simulink block diagrams can be modelled with Mealy machines. For example, the block labelled *Mode* in Fig. 1a can be modelled with the Mealy machine M*sw*R. Perhaps more importantly, the morphisms introduced to describe wiring in functional diagrams (i.e. idX, ΔX, BrA,B) can again be used to describe the same (functional) wiring for Mealy machines. Therefore, in string diagrams representing Mealy machines, plain wires represent the morphism Mid<sup>X</sup> which carries data without changing it, branching wires represent the morphism MΔ<sup>X</sup> which duplicates data, and crossing wires represent the morphism MBrA,B which reorders the components of data. The fact that Mid<sup>X</sup> and MBrA,B "act like wiring" is established in [9].

This establishes how to model wiring and functional blocks in Simulink block diagrams as Mealy machines. We can now use the operations from Sect. 4.1 to describe block diagrams which use complex wiring and functional blocks in combinations with stateful blocks.

#### **4.3 Block Diagrams to Horizontal Condition Tables**

We have explained how the categorical structure from Sect. 2.3 applies to **Mealy**, and related it to the same structure in **Set**. This framework allows us to combine update functions of individual blocks into update functions of entire block diagrams using the above definitions. For example, the update function -MswR; *delay*ud of the machine from Fig. 6b is equal to

(-MswRud ⊗ idR); (id<sup>R</sup> ⊗ Br1,R); (delayud ⊗ id1); (id<sup>R</sup> ⊗ BrR,1),

as shown in Fig. 6c, where the "1" wire is drawn in grey to illustrate how it achieves the data flow described by Fig. 5d (normally, this wire is not drawn). This can be simplified, e.g., the final sequential sub-function id<sup>R</sup> ⊗BrR,<sup>1</sup> is given by {(x,(y,())) → (x,((), y))} which simplifies to {(x, y) → (x, y)} by flattening tuples. Our presentation of monoidal categories skips the formalities which describe this simplification, but it can be intuitively understood by considering the data flow described in Fig. 6c if the grey wire were absent (as usual). Taking *delay*ud = *shift* (as defined in Sect. 2.1) which we now describe as BrR,<sup>R</sup> and using -MswRud = sw<sup>R</sup> along with appropriate axioms over the wiring morphisms, -MswR; *delay*ud simplifies to (sw<sup>R</sup> ⊗ idR); BrR,R. This simplification can be intuitively understood by considering only the black data flow in Fig. 6c. In the same way that we describe the functional data flow of Fig. 3, this approach can be repeated to describe the entire block diagram in Fig. 1a, not just the combination of blocks labelled Mode and counter.

This example illustrates how our categorical algebra for Mealy machines is structurally similar to the one used in [6] which describes the algorithm that represents block diagrams in terms of sequential/parallel/feedback configurations of components. The algorithm from [6] constructs descriptions which contain no feedback operations. A similar result can be shown in our framework, allowing us to produce trace-free descriptions of update functions in terms of the update functions of their components.

**Fig. 7.** The update function of a Mealy machine with feedback

As mentioned in Sect. 2.3, not all feedback configurations are valid. The validity of a feedback configuration describing a Mealy machine is decided by determining whether or not the trace on its update function is defined. In many settings, the trace is defined if the aforementioned fixed-point equations have a unique solution [13]. However, for Simulink models that are used to generate embedded software, the configuration must satisfy a more strict validity condition: there must be no *algebraic loops*. This means there can be no cyclic dependencies in the underlying update function, any feedback can be trivially removed by rearranging the components and wiring to "yank out" the loops while preserving the connections between blocks. For example, Fig. 7 illustrates how the update function of a simple feedback configuration can be rearranged to remove loops. This can be formalized by the notion of vacuous guardedness introduced in [7].

This means that the update functions of well-formed block diagrams can be modelled without traces. In this manner, the update function of the block diagram in Fig. 1a can be described as

$$(\mathsf{Bbr}\_{\mathsf{R},\mathsf{R}};([-1]\otimes\Delta\_{\mathsf{R}}\otimes[10]\otimes\mathrm{id}\_{\mathsf{B}}\otimes[0]);(\mathsf{add}\otimes\Delta\_{\mathsf{R}}\otimes sw\boxtimes);(\mathsf{id}\_{\mathsf{R}^{2}}\otimes\mathrm{Br}\_{\mathsf{R},\mathsf{R}});(sw\_{\mathsf{R}}\otimes gtz);\mathsf{Br}\_{\mathsf{R},\mathsf{B}})$$

where each individual function has a fixed definition, and can be represented as a predefined tabular expression. Here *gtz* denotes the > 0 block labelled IsRunning. Functions whose behaviours are not conditional are trivially represented by a table with a single condition: *true*.

HCTs—being representations of functions—can be composed like functions. We modify the composition operation in [20] to describe HCTs so that we can compose predefined tabular expressions as stated above. When composing two HCTs sequentially, the conditions of the first HCT appear first in the composed HCT and the conditions of the second HCT are included as subconditions. The conditions from the second HCT are evaluated using the output values from the first one. Consider, for example, the composition of Fig. 8a with Fig. 8b, where the output *counter* of the first table is routed to the input *counter* of the second (ignore the *running* output for now). Their composition is shown in Fig. 8c (ignore the *running* and *counter* outputs). The conditions *counter* > 0 and *start* (and their complements) appear in the same configuration as the first HCT. However, the sub-conditions (e.g. *counter* − 1 ≤ 0) come

**Fig. 8.** Introducing modes

from the conditions (*counter* ≤ 0) in the second HCT, evaluated with the values (*counter* → *counter* − 1) from the row in the first HCT associated with the parent condition (*counter* > 0). The conditions 10 > 0 and 0 > 0 (and their complements) are generated in a similar manner, but because they are trivially satisfied/impossible conditions, the sub-conditions/entire row can be removed (the removable conditions/rows are shaded in Fig. 8c).

Similarly to the conditions, the output expressions of the second HCT are evaluated with the corresponding values from the first HCT, and those are used as the output expressions of the combined HCT. In Fig. 8b, the output values for mode are constants, therefore they appear unchanged in Fig. 8c. For HCTs composed in parallel, the conditions from the second HCT are once again used as sub-conditions, but they are not modified. Similarly, the output expressions from both HCTs are placed in the combined table unchanged.

The predefined HCTs representing each function in the equation above can be combined using the operations described above to achieve a tabular expression for the entire block diagram. For example, the tabular expression in Fig. 2a can be obtained this way.

## **5 HCTs to STTs: Modes via Tables**

The HCTs produced using the technique described in Sect. 4 are an intermediate representation in our translation strategy. They illustrate the decision logic of the system as a whole, but the logic is not related to state the way it is for state charts, i.e., through modes. This section explains how HCTs are augmented with modes to form STTs, and finally state charts.

#### **5.1 Defining Modes**

The STTs described in Sect. 2.2 have obvious similarities to state charts, but they are just syntactic sugar for HCTs. STTs and state charts are modelled as Mealy machines with a special state variable *mode* with values from an enumerated set M (see, e.g., extended state machines in [2]). The cells in the first column of STTs (see Fig. 2b) express conditions of the form *mode* = *Running* which compare the value of *mode* to each element of M. The last column identifies the updated value of *mode* . Therefore, the state spaces of Mealy machines modelling STTs and state charts have the form Q = S × M, where M is the set of modes, and S contains tuples of the other state variable values.

A HCT produced via the techniques in the previous section describes the update function ud of a Mealy machine m = (S, s0, Σ, Λ, ud). We will enhance m with a state variable mode to produce a Mealy machine m<sup>+</sup> = (<sup>S</sup> <sup>×</sup>M,(s0, *mode*0), Σ, Λ, ud<sup>+</sup>) whose update function is given by a HCT which matches the format of an STT. To achieve the goal of improving readability, we leverage the existing decision logic in HCTs.

When a state chart updates, it only considers the transitions leaving its current mode, i.e., depending on its *state*, only some behaviours are possible. The same dependence on state is expressed in HCTs by conditions which depend only on the values of state variables, which will be referred to as *state conditions*. For example, in Fig. 8a, if the condition *counter* > 0 is satisfied, the system can only do one thing: decrement *counter* and set *running* to true. Our strategy associates the condition *counter* > 0 with a mode of operation Running ∈ M, and replaces the original condition with mode = Running. We augment the HCTs into STTs in a way that preserves the behaviour of the Mealy machines.

As the modes are all listed in the first column of an STT, the first augmentation reorders conditions in HCTs so that the state conditions appear first. For example, the conditions in Fig. 2a can be rearranged via the methods in [4] to obtain Fig. 8a. While our example contains only one pair of state conditions, HCTs describing general block diagrams may contain multiple nested state conditions. The second augmentation uses conjunction to flatten nested state conditions into a single column with a condition for each branch of the stateful logic.

The augmented HCT now has a specific form (Fig. 8a) which superficially resembles an STT, but the behaviour is unchanged. We now introduce a set of modes M with each element associated with a distinct condition in the first column of the augmented HCT. This association is defined by a function *md* : S → M which maps tuples of state variable values to the mode whose associated state condition is satisfied. This function is represented by an HCT with the state conditions from the augmented HCT, and distinct values from M as outputs. The md function for the timer example is given by the HCT in Fig. 8b.

Next, the Mealy machine is enhanced by introducing a state variable mode with values from M. We design the enhancement to maintain the invariant that the value of *mode* always corresponds with the state condition which the other state variables satisfy. The invariant is satisfied by the initial state (s0, *md*(s0)). The enhanced update function trivially preserves the original behaviour by ignoring the value of *mode*, but updates *mode* to maintain the invariant by evaluating md with the updated state variable values. The update function is therefore defined as ud<sup>+</sup> = (ud⊗!M); (id<sup>Λ</sup> <sup>⊗</sup>(ΔS; (id<sup>S</sup> <sup>⊗</sup>md))), where !<sup>M</sup> : <sup>M</sup> <sup>→</sup> <sup>1</sup> <sup>=</sup> {(*mode*) → ()} introduces an input whose value is discarded. Since ud and md are given as HCTs (e.g. Fig. 8a and b), the enhanced update function can be achieved through composition of tables (e.g. Fig. 8c).

This enhanced Mealy machine operates within a subset of the state space S× M where the aforementioned invariant holds. The validity of any state condition can now be deduced from the value of the mode variable (e.g. (*counter* > 0) ⇔ (mode = Running)). Thus, replacing those conditions with the corresponding modes in the HCT representation of ud<sup>+</sup> does not modify its behaviour. This is the final step in rearranging the HCT from Fig. 8c into the STT in Fig. 2b.

#### **5.2 Converting to State Charts and Simplifying**

The state chart in Fig. 9 implements the STT in Fig. 2b by creating a transition for each row and by creating assignment actions to update state and output variables. State charts produced in this manner can often be simplified by moving common actions from transitions to entry/exit actions of modes, or by removing transitions and performing the corresponding actions as during actions. For example, the state chart in Fig. 9 simplifies to the one in Fig. 1b.

In the example given above, it is crucial that the new state variable mode is tracked in addition to the existing variable *counter* . The mode variable tracks the high level system state, but the *counter* variable is still important for tracking the detailed system state. This additional information is not always important, i.e., sometimes the mode is sufficient and the old state variable may be removed from the description of the Mealy machine. This may happen if a Boolean state variable generates a state condition; knowing the value of mode can be sufficient to deduce the value of the original state variable. It is also possible that a state variable from the block diagram stores more detailed information than necessary, and knowing the mode is sufficient for the state chart to act. In these cases, the unnecessary state variables can be removed from the state chart.

## **6 Prototype, Evaluation, and Future Work**

The methodologies presented here have been used to develop a prototype tool which automatically refactors Simulink model fragments to Stateflow [18]. The tool supports a large subset of discrete Simulink blocks typically used for implementation of embedded software. The refactoring tool is implemented in Matlab and integrates with Simulink allowing the user to select the blocks they would like to replace. When the tool is invoked, it generates a Stateflow chart and uses the Simulink Design Verifier [17] to verify that it is equivalent to the selected blocks.

The prototype tool improves the readability of small to medium sized block diagrams such as the one in Fig. 1a. However, we found that the stateful logic of complex industrial-scale models incorporates multiple state machines interacting with each other and with stateless conditional logic. To elegantly represent these complex block diagrams in Stateflow, the translation methodologies presented here can be enhanced to utilize the more sophisticated mechanisms of state charts such as hierarchical/parallel modes. We believe that many state chart mechanisms have analogies in tabular expressions, e.g., using hierarchies of state conditions can be leveraged to specify sub-modes. We found that block diagrams encoding more than 4 high-level modes can often become difficult to understand without these mechanisms.

**Fig. 9.** State chart equivalent to STT

We also recognize the importance of finding refactorable fragments in large models. In fact, the translation methodology presented in this paper was developed in parallel with an identification strategy that pinpoints block diagrams which are candidates for refactoring—it searches for certain patterns of logical and stateful blocks which indicate complex state update logic. An elaborated description of both translation and identification strategies will be presented in the master's thesis of the first author [29].

## **7 Related Work**

Several papers propose translating Simulink block diagrams to formal languages to enable their verification using existing tools (e.g., [1,6,14,23,27,30]). Only a few, however, translate Simulink block diagrams to state transition diagrams. In [19], Simulink block diagrams are converted into an extended version of hybrid automata, with each block in a block diagram converted to a hybrid automaton, leading to an explosion in the number of states of the resulting model. In [31], Simulink models are converted to finite state machines, but transitions between states represent the small execution steps of individual blocks updates, not changes in the high level system modes. Both studies [19,31], as well as [16], do not aim to capture the high-level state machine of an entire block diagram. This is exactly what our approach does, with maintainability of the resulting model as a prime motivator.

Our approach to modelling Mealy machines and their interactions using the monoidal category **Mealy** follows a general trend in behavioural modelling. For example, monoidal categories have been used to describe interactions of quantum processes [5], labelled transition systems [12], and control systems [3]. The algebra of (traced symmetric) monoidal categories is similar to the algebra used to describe block diagrams in [6], but our approach uses a standard mathematical framework with a rich history and many known results. For example, the results of [9] indicate that by considering equivalence up to bisimilarity, the category **Mealy** is symmetric monoidal, meaning the appropriate axioms and resulting properties of this structure are already known.

## **8 Conclusion**

In this paper, we proposed a method for translating Simulink block diagrams to Stateflow state charts via tabular expressions representing their respective Mealy machines update functions. A categorical framework for composing Mealy machines provides a theoretical basis for the translation. To the best of our knowledge, this is the first method for Simulink to Stateflow translation. Our proposed method is relevant to industrial development where it can help improve software maintainability and aid compliance with modelling guidelines.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Metric Temporal Graph Logic over Typed Attributed Graphs

Holger Giese, Maria Maximova, Lucas Sakizloglou, and Sven Schneider(B)

Hasso Plattner Institute, University of Potsdam, Potsdam, Germany {holger.giese,maria.maximova,lucas.sakizloglou,sven.schneider}@hpi.de

Abstract. Various kinds of typed attributed graphs can be used to represent states of systems from a broad range of domains. For dynamic systems, established formalisms such as graph transformation can provide a formal model for defining state sequences. We consider the case where time may elapse between state changes and introduce a logic, called *Metric Temporal Graph Logic* (MTGL), to reason about such timed graph sequences. With this logic, we express properties on the structure and attributes of states as well as on the occurrence of states over time that are related by their inner structure, which no formal logic over graphs concisely accomplishes so far.

Firstly, based on timed graph sequences as models for system evolution, we define MTGL by integrating the temporal operator *until* with time bounds into the well-established logic of (nested) graph conditions. Secondly, we outline how a finite timed graph sequence can be represented as a single graph containing all changes over time (called graph with history), how the satisfaction of MTGL conditions can be defined for such a graph and show that both representations satisfy the same MTGL conditions. Thirdly, we present how MTGL conditions can be reduced to (nested) graph conditions and show using this reduction that both underlying logics are equally expressive. Finally, we present an extension of the tool AutoGraph allowing to check the satisfaction of MTGL conditions for timed graph sequences, by checking the satisfaction of the (nested) graph conditions, obtained using the proposed reduction, for the graph with history corresponding to the timed graph sequence.

Keywords: Nested graph conditions · Metric temporal logic · Sequence properties · Typed attributed graphs · Symbolic graphs

## 1 Introduction

Various kinds of typed attributed graphs are used to represent states of systems from a broad range of domains. Also, the evolution of such systems can be described using a multitude of graph transformation formalisms in which the possible behavior in form of graph sequences is defined by a set of rules and their application. In many cases, the analysis of this induced behavior with respect to a specification in form of a temporal logic that defines the admissible graph sequences is of paramount importance.

*In our running example*, from which we derive the lack of suitable specification formalisms, we consider a dynamic system describing an operating system, which generates timed sequences of (typed attributed) graphs to model the change of the operating system states over time. In this example, users may create tasks with identifiers *id*, the operating system may create handlers specific to task identifiers to allow for the task execution, and the handlers may produce a result when a task has been executed (marking the successful handling of the task). To model the states of the operating system, we employ graphs that store the tasks, the handlers, and the computed results. In the remainder, we refer in the context of this example to the *sequence property* P to be checked w.r.t. the *timed graph sequence* at hand describing systems' state changes over time.

P: Whenever a task T with identifier *id* is created on a system S, a handler H for this task (i.e., with a task identifier *t*\_*id* equal to *id* of T) must exist. Moreover, within 120 timeunits, the handler must produce a result R with value *success* and, during the computation of the result, no other handler H for the same task (i.e., with the same task identifier *t*\_*id*) may exist.

*We consider the problem* that existing specification formalisms for graphbased systems cannot cover properties such as P. The available (metric) temporal logics, such as Metric Temporal Logic (MTL) [16], are defined over Kripke structures abstracting from the system states by labeling each state with a subset of the finite set of atomic propositions. The commonly used operator *until* allows then to formalize the part of property P stating that every graph that contains a task T is followed by some graph containing some result R before t time units. However, the existing metric temporal logics do not support the use of *bindings* of elements contained in the graphs to express how a certain matched pattern evolves in a sequence of graphs. Therefore, they are insufficient when e.g. creating different tasks T and T must be followed by creating the *corresponding* results R and R while also treating the deadlines for their existence separately.

*As a first contribution*, we define *Metric Temporal Graph Logic* (MTGL) for the concise specification of systems that generate timed graph sequences. In MTGL, we express properties on *states* using the well-known formalism of nested graph conditions [12,24] (called GCs for short). The satisfaction of a GC that states the existence of a graph pattern H in the given graph G results in a *match* m from H to G. We extend the logic of GCs to MTGL by extending GCs with the metric temporal operator *until* that may appear in the scope of a previously determined match m. Using this extension, we can express properties, such as property P, on the structure and attributes of states as well as on the occurrence of states over time where the preservation/extension of matches during a systems' evolution increases the expressiveness beyond the existing formal logics.

*As a second contribution*, we outline how a finite timed graph sequence can be represented as a single graph containing all changes over time (called *graph with history*), how the satisfaction of MTGL conditions can be defined for such a graph, and show that both representations satisfy the same MTGL conditions.

*As a third contribution*, we show that MTGL conditions can be reduced to GCs using attribute constraints to encode the metric temporal requirements, while preserving the satisfaction for finite timed graph sequences. This encoding enables the direct application of techniques for GCs such as [25].

*As a fourth contribution*, we present an extension of the tool AutoGraph [25] allowing to check the satisfaction of MTGL conditions for timed graph sequences by checking the satisfaction of the GCs obtained using the proposed reduction for the graph with history corresponding to the timed graph sequence at hand.

The paper is structured as follows. Section 2 discusses related work. Section 3 iterates on technical preliminaries. Section 4 defines timed graph sequences, MTGL, and the satisfaction of MTGL conditions for timed graph sequences. In Sect. 5, we show how to represent a finite timed graph sequence as a single graph with history, define satisfaction of MTGL conditions for a graph with history, and prove that both representations satisfy the same MTGL conditions. In Sect. 6, we introduce a reduction of MTGL conditions to GCs and show the equivalence of these two logics. Finally, Sect. 7 discusses the tool support and Sect. 8 concludes the paper with a summary and remarks on future work.

## 2 Related Work

There are several related formal and informal approaches for the specification and verification of different kinds of sequence properties.

In [13] the satisfaction of CTL (state/sequence) properties is checked where the tool Groove [10,26] is used to generate the finite state space of the graph transformation system (GTS) at hand. In [7] invariants are checked for a GTS with a possibly infinite state space. The validity of given pre/post conditions for a program over a GTS has been presented in [23]. In [2,15] temporal properties for GTS with infinite state space are checked using the tool Augur2.

In [19] the satisfaction of graph-based probabilistic timed CTL properties is checked where the tool Henshin [1,8] is used to generate the finite state space of a GTS and where the tool Prism [17] is used to model check translations of the given properties. In [6] a sequence of timed events are checked against sequence properties given by regular languages based on deterministic finite automata.

The use of bindings, as in this paper, is supported in [3] where bindings are part of the Metric First-Order Temporal Logic in which system states are represented by a set of relations that are adapted during the execution of the system.

A visual but informal notation for the specification of sequence properties involving time and graph bindings was introduced in [14].

In conclusion, existing approaches with a formal semantics do not support either time, bindings, or graphs in a concise manner. Thereby, our graph-based logic MTGL for graph-based systems complements existing approaches since (a) it eases usability in graph-based contexts similarly to the usage of GCs that are favored over first-order logic in these contexts, (b) it enables further developments and combinations with other graph-based techniques such as those in [25], and, (c) as to be shown by future tool-based evaluations, it can be expected that domain-specific tools for checking MTGL conditions are more efficient compared to general-purpose tools such as shown analogously for GCs in [23].

Fig. 1. The type graph *TG* for our running example where the attributes cts and dts of sort real used in later sections are omitted in every node and edge to improve readability

## 3 Typed Attributed Graphs and Graph Conditions

We now recall typed attributed graphs and nested graph conditions used for representing system states and properties on these states, respectively.

We use *symbolic graphs* [21] to encode (finite) typed attributed graphs. Symbolic graphs are an adaptation of E-Graphs [9] where a graph does not contain data nodes (i.e., elements that represent actual values) but instead node and edge attributes are connected to variables, which replace the data nodes. Symbolic graphs are also equipped with attribute constraints over these (sorted) variables (e.g. <sup>x</sup> = 5, <sup>x</sup> <sup>≤</sup> <sup>5</sup>, and *<sup>y</sup>* = "aabb").

We consider symbolic graphs that are typed over a type graph *TG* using a typing morphism *type* : <sup>G</sup> <sup>→</sup> *TG*. Type graphs restrict attributed graphs to an admitted subset. For our running example, we employ the type graph *TG* from Fig. 1. An example of a symbolic graph that is typed over *TG* is given in Fig. 4.

We state the existence and nonexistence of graph patterns in a given symbolic graph, which is called a *host graph*, by representing graph patterns by symbolic graphs and by using monomorphisms (called *monos* and denoted using -−→ subsequently) to extend graph patterns. Formally, we rely on the notion of nested graph conditions (GCs) [12], which are expressively equivalent to first-order logic on graphs [5] as shown in [12,24].

Definition 1 (Graph Conditions (GCs)). *The class of* graph conditions (GCs) ΦGC <sup>H</sup> *for the graph* H *contains* ψ *if one of the following cases applies.*

*–* <sup>ψ</sup> <sup>=</sup> <sup>∧</sup><sup>S</sup> *and* <sup>S</sup> <sup>=</sup> {φ1,...,φ<sup>n</sup>} ⊆ <sup>Φ</sup>GC <sup>H</sup> *. –* <sup>ψ</sup> <sup>=</sup> <sup>¬</sup><sup>φ</sup> *and* <sup>φ</sup> <sup>∈</sup> <sup>Φ</sup>GC <sup>H</sup> *. –* <sup>ψ</sup> <sup>=</sup> <sup>∃</sup>(a, φ)*,* <sup>a</sup> : H -−→ H *, and* <sup>φ</sup> <sup>∈</sup> <sup>Φ</sup>GC H-*.*

*GCs allow for further abbreviations such as true, false,* <sup>∨</sup>S*, and* <sup>∀</sup>(a, φ)*.*

Intuitively, a GC is satisfied if the positive but not the negative patterns given by the GC can be found in the given host graph. For the case of the *exists* operator, a previously determined match m must be extendable using a mono q according to the mono a from the GC.

Definition 2 (Satisfaction of GCs). *A GC* <sup>ψ</sup> <sup>∈</sup> <sup>Φ</sup>GC <sup>H</sup> *is* satisfied *by a mono* m : H -−→ <sup>G</sup>*, written* <sup>m</sup> <sup>|</sup><sup>=</sup> <sup>ψ</sup>*, if one of the following cases applies.* a


*A GC* <sup>ψ</sup> *over the empty graph is satisfied by a graph* <sup>G</sup>*, written* <sup>G</sup> <sup>|</sup><sup>=</sup> <sup>ψ</sup>*, if* <sup>i</sup><sup>G</sup> <sup>|</sup><sup>=</sup> <sup>ψ</sup> *where* <sup>i</sup><sup>G</sup> : <sup>∅</sup> -−→ G *is the initial morphism to* G*.*

## 4 Metric Temporal Graph Logic

We build upon GCs [12] and the future fragment of MTL [16,22] to introduce *Metric Temporal Graph Logic* (MTGL) by defining its syntax and semantics.

We assume a graph transformation based formalism for the definition of steps changing a graph while possibly also determining a progress of time. We abstract from the actual timed graph transformation formalism employed but only assume that it is capable to generate so-called *timed graph sequences* (short TGSs), which contain the graphs, their modifications, and the elapsed time between successive graphs. In the following, we are concerned with TGSs in which either only the past states of sequences are given in the form of *finite* TGSs or where, alternatively, an *infinite* TGS describes a nonterminating evolution of a system.

A step from a graph G to a graph G where G has remained unchanged for a duration of δ, which may be determined by a timed graph transformation formalism, is represented by <sup>G</sup> ·(δ, l, r)· <sup>G</sup> in our notion of TGSs. In this representation, the monos l : *IG* -−→ <sup>G</sup> and <sup>r</sup> : *IG* -−→ G identify the graph elements that are preserved from G to G , i.e., <sup>G</sup> <sup>−</sup> <sup>l</sup>(*IG*) are the nodes and edges that are present in <sup>G</sup> but are deleted to obtain <sup>G</sup> and <sup>G</sup> <sup>−</sup> <sup>r</sup>(*IG*) are the nodes and edges that do not exist in G but are created to obtain G . 1

Definition 3 (Timed Graph Sequences (TGSs)). *We inductively define the class of finite* timed graph sequences *(TGSs)* Π*fin as follows:*


*The class of TGSs* Π *contains the finite TGSs* Π*fin from above and all infinite sequences that have only finite TGSs from* Π*fin as prefixes.*

*Moreover,* dur(π) *denotes the sum of all durations* δ *contained in* π*. Additionally, if* dur(π) = <sup>∞</sup>*,* <sup>π</sup><sup>t</sup> *denotes the unique graph at time* <sup>t</sup>*, i.e., if* <sup>π</sup> <sup>=</sup> <sup>G</sup> *then* <sup>π</sup><sup>t</sup> <sup>=</sup> <sup>G</sup> *and if* <sup>π</sup> <sup>=</sup> <sup>G</sup> · (δ, l, r) · <sup>π</sup> *then (*π<sup>t</sup> <sup>=</sup> <sup>G</sup> *for* t<δ*) and (*π<sup>t</sup> <sup>=</sup> <sup>π</sup> <sup>t</sup>−<sup>δ</sup> *for* <sup>t</sup> <sup>≥</sup> <sup>δ</sup>*). Finally, if* dur(π) = <sup>∞</sup>*,* <sup>π</sup>[t1,t2] *denotes the finite TGS contained in* π *between and including* π<sup>t</sup><sup>1</sup> *and* π<sup>t</sup><sup>2</sup> *.*

We do not require that every step modifies the current graph (i.e., we permit G = G possibly using l = r = idG). Also, time may not elapse in a step (i.e., we permit δ = 0) but for well-definedness of the satisfaction relation for TGSs we require that time diverges in every infinite TGS <sup>π</sup> (i.e., dur(π) = <sup>∞</sup>).

In our running example, we simplify the presentation by using only inclusions <sup>l</sup> and <sup>r</sup>. The TGS <sup>π</sup> given in Fig. <sup>2</sup> contains five graphs <sup>G</sup><sup>i</sup> for <sup>i</sup> ∈ {0, <sup>1</sup>, <sup>2</sup>, <sup>3</sup>, <sup>4</sup>} showing the system states in five different points in time, namely 0, 5, 10, 13, and 15. The corresponding durations where the respective graphs G<sup>i</sup> remain unchanged are denoted by <sup>δ</sup><sup>i</sup> for <sup>i</sup> ∈ {0, <sup>1</sup>, <sup>2</sup>, <sup>3</sup>}.

<sup>1</sup> The span <sup>G</sup> <sup>l</sup> ←- IG <sup>r</sup> −→ G does not correspond to a rule as used in the DPO approach but rather to a rule application describing changes between the graphs G and G- .

Fig. 2. A TGS <sup>π</sup> for our running example. For <sup>i</sup> ∈ {0, <sup>1</sup>, <sup>2</sup>, <sup>3</sup>}, the arrows <sup>δ</sup>*<sup>i</sup>* <sup>=</sup><sup>⇒</sup> between graphs of the TGS describe changes G<sup>i</sup> · (δi, li, ri) · G<sup>i</sup>+1 where the inclusions l<sup>i</sup> and r<sup>i</sup> are implicitly given by the usage of the same names in all graphs.

Fig. 3. The property P from our running example formalized by the MTGC <sup>ψ</sup>

The syntax of MTGL is given by *Metric Temporal Graph Conditions* (short MTGCs) introduced in the following definition. The distinguishing feature of MTGL is the extension of the binding of graph elements used by the operator *exists* in GCs to the *until* operator of MTL. This allows for the formalization of properties where a match into a graph is preserved/extended over multiple timepoints in the subsequently introduced semantics for TGSs.

Definition 4 (Metric Temporal Graph Conditions (MTGCs)). *The class of* metric temporal graph conditions (MTGCs) ΦMTGC <sup>H</sup> *for the graph* H *contains* ψ *if one of the following cases applies.*

*–* <sup>ψ</sup> <sup>=</sup> <sup>∧</sup><sup>S</sup> *and* <sup>S</sup> <sup>=</sup> {φ1,...,φ<sup>n</sup>} ⊆ <sup>Φ</sup>MTGC <sup>H</sup> *. –* <sup>ψ</sup> <sup>=</sup> <sup>¬</sup><sup>φ</sup> *and* <sup>φ</sup> <sup>∈</sup> <sup>Φ</sup>MTGC <sup>H</sup> *. –* <sup>ψ</sup> <sup>=</sup> <sup>∃</sup>(a, φ)*,* <sup>a</sup> : H -−→ H *, and* <sup>φ</sup> <sup>∈</sup> <sup>Φ</sup>MTGC H- *. –* <sup>ψ</sup> <sup>=</sup> <sup>φ</sup><sup>1</sup> <sup>U</sup><sup>I</sup> <sup>φ</sup>2*,* <sup>I</sup> *is an interval over* **<sup>R</sup>**0*, and* {φ1, φ2} ⊆ <sup>Φ</sup>MTGC <sup>H</sup> *.*

Further metric temporal operators can be defined as for MTL and GCs.

For our running example, we formalize the property P from Sect. 1 by the MTGC ψ depicted in Fig. 3. In this MTGC, we additionally use the *forall-new* operator in the form of <sup>∀</sup><sup>N</sup>(<sup>a</sup> : H -−→ H , φ) to match the pattern H into the considered TGS as soon as possible, i.e., precisely at the minimal timepoint, at which all elements of H exist. This operator can be encoded by the equivalent MTGC <sup>¬</sup>((¬∃(a,¬φ)) U[0,∞) <sup>∃</sup>(a,¬φ)), which intuitively states that "there is no violation ever that did not exist before". Moreover, we use notational conventions to simplify our presentation of MTGCs by omitting elements in subconditions. Firstly, we omit nodes (such as *T*) if no new edges or attributes are attached to them. Secondly, we omit edges (such as *e<sup>1</sup>* ) if no new attributes are attached to them. Finally, we omit attributes (such as *id* of *T*) in general.

The MTGC ψ properly formalizes the property P using the binding capabilities of MTGL as follows: the nodes *T*, *S*, and *H* (together with the edges e1, e<sup>2</sup> as well as their attributes) are shared among the two subconditions of the *until* operator. This implies that the Handler node that must be matched by the right subcondition of the *until* operator is the previously bound Handler node *H* . Similarly, the System node that may be matched by the left subcondition of the *until* operator is the previously bound System node *S*.

Next we present the MTGL *semantics for TGSs* that defines when a given TGS satisfies a given MTGC. For the definition of this semantics, we first introduce the concept of a *match that is preserved over a finite number of steps* given by a finite TGS. In the following, we also call such a preserved match a *binding*. The preservation of the match is guaranteed by adapting it according to the renaming determined by the steps of the TGS for the case where these steps do not remove any element initially matched.

Definition 5 (Preserved Match for a Finite TGS). *A mono* m : H -−→ G<sup>0</sup> *is* preserved over a finite TGS π *that starts in* G<sup>0</sup> *and ends in* G<sup>n</sup> *resulting in a mono* m : H -−→ <sup>G</sup>n*, written* <sup>m</sup> <sup>π</sup> <sup>m</sup> *, if one of the following cases applies.*

*–* π = G<sup>0</sup> = G<sup>n</sup> *and* m = m *. –* <sup>π</sup> <sup>=</sup> <sup>G</sup><sup>0</sup> · (δ, l : *IG* -−→ <sup>G</sup>0, r : *IG* -−→ <sup>G</sup>1) · <sup>π</sup> *and there is* m : H -−→ *IG such that* <sup>m</sup> <sup>=</sup> <sup>l</sup> ◦ <sup>m</sup> *and* <sup>r</sup> ◦ <sup>m</sup> <sup>π</sup>- m *.* G<sup>0</sup> *IG* G<sup>1</sup> H = = m l r m m

The fact that the step does not remove elements that are matched by a mono <sup>m</sup> is obtained from the existence of a mono <sup>m</sup> making the triangle <sup>m</sup> <sup>=</sup> <sup>l</sup> ◦ <sup>m</sup> commute. The required renaming is then performed by replacing the match m by r ◦ m. The mono m is uniquely defined when it exists.

Based on the preservation of matches, we now define the semantics for TGSs.

Definition 6 (Satisfaction of MTGCs by TGSs). *A given MTGC* ψ ∈ ΦMTGC <sup>H</sup> *is* satisfied *by a TGS* π*, an observation timepoint* t ∈ **R**0*, and a mono* m : H -−→ <sup>π</sup>t*, written* (π, t, m) <sup>|</sup>=TGS <sup>ψ</sup>*, if one of the following cases applies.*

	- *there is* <sup>m</sup> : H -−→ π<sup>t</sup>+t *s.t.* <sup>m</sup> <sup>π</sup>[*t,t*+*t*-] m *and* (π, t + t , m ) <sup>|</sup>=TGS <sup>φ</sup><sup>2</sup> *and* <sup>∈</sup> [0, t ) *it holds that there is an* m : H *such that*
	- *for every* t −→ π<sup>t</sup>+t-<sup>m</sup> <sup>π</sup>[*t,t*+*t*--] m *and* (π, t + t , m) <sup>|</sup>=TGS <sup>φ</sup>1*.*

*An MTGC* <sup>ψ</sup> *over the empty graph is satisfied by a TGS* <sup>π</sup>*, written* <sup>π</sup> <sup>|</sup>=TGS <sup>ψ</sup>*, if* (π, <sup>0</sup>, <sup>i</sup><sup>π</sup><sup>0</sup> ) <sup>|</sup>=TGS <sup>ψ</sup> *where* <sup>i</sup><sup>π</sup><sup>0</sup> : <sup>∅</sup> -−→ π<sup>0</sup> *is the initial morphism to the graph at timepoint* 0 *of* π *(i.e., the first graph of* π*).*

This semantics is similar to the semantics of GCs for *conjunction*, *negation*, and the *exists* operator since for the triple (π, t, m) it always holds that the codomain of m is the graph π<sup>t</sup> and since the checked MTGC is defined for the domain of m. The TGS π and the current timepoint t are used in the case for the *until* operator where we rely on the *preserved match* relation from above to change the codomain of a match from π<sup>t</sup> to the graphs πt+t and πt+t- at later timepoints.

*Example 1 (TGS satisfies MTGC).* Considering our running example, we argue that the MTGC given in Fig. 3 is satisfied by the TGS given in Fig. 2. Firstly, the *forall-new* operator matches the nodes T, S and the edge e<sup>1</sup> in G<sup>2</sup> at timepoint 10, which is the maximal creation timepoint of these three elements. Then, the *exists* operator matches the node *H* together with the edge e<sup>2</sup> in G<sup>2</sup> at the same timepoint. Finally, the *until* operator matches subsequently the node *R* and the edge e<sup>3</sup> in G<sup>3</sup> at the timepoint 13 and the remainder *true* is trivially satisfied for the timepoint 13. In addition, as also required by the *until* operator, for every timepoint in the interval [10, 13), it is not possible to match a second Handler node *H* that is connected to S. This holds because the graph in π for the timepoints in this interval is the graph G2, which indeed does not contain such a second Handler node.

## 5 Mapping of TGSs to Graphs with History

Subsequently, we are concerned with finite TGSs π (which have a finite number of steps and therefore also satisfy dur(π) <sup>&</sup>lt; <sup>∞</sup>) for which the satisfaction of an MTGC <sup>ψ</sup> is decidable [4] when replacing in <sup>ψ</sup> right-open intervals [r,∞) and (r,∞) by [r, dur(π)) and (r, dur(π)), respectively. Such an adaptation of intervals leads to an MTGC ψ that is *bounded* and for which the satisfaction by the finite TGS <sup>π</sup> is equivalent (i.e., <sup>π</sup> <sup>|</sup>=TGS <sup>ψ</sup> ⇐⇒ <sup>π</sup> <sup>|</sup>=TGS <sup>ψ</sup> ).

To analyze the satisfaction of an MTGC by a given finite TGS, we now introduce the notion of *graphs with history* (in short, GHs) as an equivalent representation of a given finite TGS. Afterwards, we introduce a semantics operating on this alternative representation (called in the following *semantics for GHs*) that is compatible with the semantics introduced before for TGSs. The translation from finite TGSs to GHs reduces the size of the representation in terms of the stored data. Moreover, it decouples the observation of modifications, resulting in a GH, and the subsequent satisfaction check for possibly several MTGCs.

The notion of GHs for capturing the changes to a current graph over time as given by a TGS π, requires that the used type graph *TG* contains for all nodes and edges the attributes cts and dts of sort real to capture the total timepoint at which an element was created and (if applicable) deleted, respectively.<sup>2</sup>

Definition 7 (Graphs with History (GHs)). *Let TG be a type graph where all nodes and edges have attributes* cts *denoting the timepoint of their creation and* dts *denoting the timepoint of their deletion. Then* G<sup>H</sup> *is a* graph with history (GH) *if it is typed over TG satisfying the following consistency requirements.*<sup>3</sup>


We now define the operation <sup>F</sup>old, which converts a finite TGS <sup>π</sup> (i.e., a TGS with a finite number of steps) into the corresponding GH GH. This recursive operation handles the renaming given by the monos l and r in the steps of π and, moreover, encodes the insertion of additional nodes/edges α by adding attributes cts = t for these nodes/edges in the constructed G<sup>H</sup> and by equipping removed nodes/edges α with an additional attribute dts = t where t is the current total time of the considered TGS π in both cases.

## Definition 8 (Map TGS to GH (Operation <sup>F</sup>old)).


The following example covers an application of <sup>F</sup>old to a finite TGS.

*Example 2 (Map TGS to GH).* We map the finite TGS π from Fig. 2 to the GH <sup>G</sup><sup>H</sup> shown in Fig. <sup>4</sup> using the operation <sup>F</sup>old as follows. Since <sup>π</sup> starts with an empty graph G0, we first map it into the empty GH. The second state of π given by G<sup>1</sup> including the System node S is added to the TGS after 5 timeunits. We map this TGS state to the GH by adding S to the empty GH

<sup>2</sup> The total timepoints of additions and removals of attributes and their values can be encoded by moving attributes into separate nodes, for which their cts and dts attributes then encode the relevant timepoints.

<sup>3</sup> Note that the consistency requirements used in this definition are not guaranteed by the formalisms of E-Graphs or symbolic graphs.

Fig. 4. Mapping of the TGS <sup>π</sup> from Fig. <sup>2</sup> to the GH <sup>G</sup><sup>H</sup> <sup>=</sup> <sup>F</sup>old(π)

and by, additionally, equipping this node with the creation timepoint cts = 5. After another 5 timeunits, an additional Task node T, a Handler node H, and edges e1, e<sup>2</sup> between the existing System node S and the new Task node T resp. the new Handler node H are added to the TGS resulting in the TGS state G2. These changes are again mapped to the GH by adding the Task node T, the Handler node H, and the edges e1, e<sup>2</sup> to the current version of G<sup>H</sup> as well as by additionally equipping them with the creation timepoints cts = 10. In a similar manner the Result node R together with the edges e<sup>3</sup> and e<sup>4</sup> (see the TGS state G3) are added to the GH with the creation timepoints cts = 13. Finally, after 2 timeunits, the edge e<sup>3</sup> is deleted to obtain the TGS state G4. To reflect this in the GH, we add to the edge e<sup>3</sup> in G<sup>H</sup> the additional deletion timepoint dts = 15.

For the satisfaction of an MTGC of the form <sup>∃</sup>(<sup>a</sup> : H -−→ H , φ), where the *exists* operator is inherited from GCs, it is still required that the pattern that is found so far (given by some mono m : G -−→ GH) in the host graph G<sup>H</sup> can be extended to a larger pattern (given by some mono m : G -−→ GH). Additionally, we have to check that all matched elements are already created (because the GH also contains the elements created with higher cts values) but not yet deleted (because the GH also contains the elements deleted at earlier timepoints). For the satisfaction of an MTGC of the form φ<sup>1</sup> U<sup>I</sup> φ2, where the *until* operator is inherited from MTL, it is still required that φ<sup>2</sup> must be satisfied at some timepoint t in the interval I relative to the current observation timepoint t and that φ<sup>1</sup> is continuously satisfied (by a possibly varying match for each timepoint) for all timepoints preceding t .

Definition 9 (Satisfaction of MTGCs by GHs). *An MTGC* <sup>ψ</sup> <sup>∈</sup> <sup>Φ</sup>MTGC <sup>H</sup> *is satisfied by a mono* m : H -−→ G<sup>H</sup> *and an observation timepoint* t ∈ **R**0*, written* (m, t) <sup>|</sup>=GH <sup>ψ</sup>*, if* max({0} ∪ cts(m(H))) <sup>≤</sup> t < min({∞} ∪ dts(m(H))) *and one of the following cases applies.*


*An MTGC* <sup>ψ</sup> *over the empty graph is satisfied by a GH* <sup>G</sup>H*, written* <sup>G</sup><sup>H</sup> <sup>|</sup>=GH <sup>ψ</sup>*, if* (i<sup>G</sup>*<sup>H</sup>* , 0) <sup>|</sup>=GH <sup>ψ</sup> *where* <sup>i</sup><sup>G</sup>*<sup>H</sup>* : <sup>∅</sup> -−→ G<sup>H</sup> *is the initial morphism to* GH*.*

Note that the reasoning for the satisfaction of the MTGC ψ from Fig. 3 by <sup>G</sup><sup>H</sup> <sup>=</sup> <sup>F</sup>old(π) from Fig. <sup>4</sup> proceeds analogously to Example 1.

In the following theorem (see [11] for its proof), we state the compatibility of the two satisfaction relations for the case of finite TGSs showing that they can be used interchangeably to determine the satisfaction of an MTGC in this case.

Theorem 1 (Soundness of Operation <sup>F</sup>old). *If* <sup>π</sup> <sup>∈</sup> <sup>Π</sup>*fin and* <sup>ψ</sup> <sup>∈</sup> <sup>Φ</sup>MTGC ∅ *then* <sup>π</sup> <sup>|</sup>=TGS <sup>ψ</sup> *iff* <sup>F</sup>old(π) <sup>|</sup>=GH <sup>ψ</sup>*.*

## 6 Reduction of MTGL to GCs

We now introduce a procedure for checking the satisfaction of an MTGC by a GH using a reduction of an MTGC to a corresponding GC. Based on the <sup>F</sup>old operation from the previous section, we thereby obtain a checking procedure for finite TGSs as well. Moreover, this reduction shows that MTGL is as expressive as the logic of GCs on finite TGSs (since every GC is trivially also an MTGC).

We first present the operation <sup>R</sup>educe for translating an MTGC into the corresponding GC and then show that this translation (also called *reduction* in the following) is compatible with our semantics for GHs and the operation <sup>F</sup>old from before. The operation <sup>R</sup>educe encodes in the resulting GC all parts of the satisfaction relation <sup>|</sup>=GH that are not covered by the satisfaction relation <sup>|</sup><sup>=</sup> for GCs. In particular, the operation <sup>R</sup>educe removes all occurrences of the *until* operator and encodes the check that the elements that are matched by the *exists* operator have all been created as well as that none of them has yet been deleted.

Technically, we translate a GH <sup>G</sup><sup>H</sup> <sup>=</sup> <sup>F</sup>old(π) for a finite TGS <sup>π</sup>, <sup>ψ</sup> <sup>∈</sup> ΦMTGC <sup>∅</sup> , and an observation timepoint <sup>t</sup> <sup>∈</sup> **<sup>R</sup>**<sup>0</sup> (where <sup>G</sup><sup>H</sup> and <sup>ψ</sup> are typed over a type graph *TG*) into a graph G <sup>H</sup> and <sup>ψ</sup> <sup>∈</sup> <sup>Φ</sup>GC <sup>∅</sup> (where both are typed over a changed type graph *TG* ) using the procedure presented in Definition 10. We obtain ψ from ψ by encoding the *until* operator suitably and by implementing the checks of cts and dts attributes according to Definition 9 for the *exists* and *until* operators using attribute constraints, for which we add variables to ψ. We also add the same variables to G<sup>H</sup> to obtain G H.

Definition 10 (Reduce MTGC to GC (Operation <sup>R</sup>educe )). *The recursive operation* <sup>R</sup>educe *takes 3 arguments: a GH* <sup>G</sup><sup>H</sup> *that has been obtained by application of the operation* <sup>F</sup>old *to a TGS* <sup>π</sup>*, an observation timepoint* <sup>t</sup> <sup>∈</sup> **<sup>R</sup>**0*, and an MTGC* <sup>ψ</sup> <sup>∈</sup> <sup>Φ</sup>MTGC <sup>∅</sup> *.* <sup>G</sup><sup>H</sup> *and all graphs contained in* <sup>ψ</sup> *are typed over the type graph TG.*

*The operation* <sup>R</sup>educe *returns a pair* (G <sup>H</sup>, ψ ) *consisting of a graph* G H *(which is a slight modification of* <sup>G</sup>H*) and a GC* <sup>ψ</sup> <sup>∈</sup> <sup>Φ</sup>GC <sup>∅</sup> *. The graph* <sup>G</sup> <sup>H</sup> *and all graphs contained in* ψ *are typed over an adapted type graph TG (called a* reduction type graph*) introduced below.*


Fig. 5. The GC <sup>ψ</sup> and the adapted graph G- <sup>H</sup> resulting from applying the operation Reduce to the GH from Fig. 4, the timepoint t = 10, and the MTGC ψ from Fig. 3 (where the outermost *forall-new* operator has been simplified to the *forall* operator)

*3. (Construction of the GC* ψ *):* <sup>ψ</sup> <sup>=</sup> <sup>∃</sup>(i<sup>G</sup><sup>0</sup> , <sup>R</sup>educerec(ψ*att*, x0, G0, <sup>∅</sup>)) *where* <sup>G</sup><sup>0</sup> *is the graph containing the Encoding node* v<sup>0</sup> *with the attributes* num = *0 ,* var = *x<sup>0</sup> as well as the attribute constraint* <sup>x</sup><sup>0</sup> <sup>=</sup> <sup>t</sup> *and* <sup>i</sup><sup>G</sup><sup>0</sup> : <sup>∅</sup> -−→ G<sup>0</sup> *is the initial morphism to* G0*.* *Then,* <sup>R</sup>educerec(ψatt, xo, Ga, G) = <sup>ψ</sup> att *if one of the following cases applies (where* ψatt *is the condition to be reduced,* x<sup>o</sup> *is the timepoint at which the subcondition must be satisfied,* G<sup>a</sup> *is the graph containing additional nodes, edges, and attribute constraints to be added to the graphs in conditions constructed, and* G *is the graph over which the condition* ψ*att is defined).*

	- *We obtain* G <sup>H</sup> *by adding elements to* G<sup>H</sup> *as follows:*
	- *(a) We add the attribute* dts = <sup>−</sup><sup>1</sup> *to all nodes/edges without that attribute.*
	- *(b) We insert all Encoding nodes contained in graphs in* ψ *together with their* num = *n and* var = *x<sup>n</sup> attributes.*
	- *(c) We add the attribute constraints added during the reduction except for the* alive *constraints.*

We now demonstrate how the operation <sup>R</sup>educe can be applied to the MTGC from our running example.

*Example 3 (Reduce MTGC to GC).* We now apply the <sup>R</sup>educe operation to GH from Fig. 4, the timepoint t = 10, and the MTGC ψ from Fig. 3 resulting in G H and ψ given in Fig. 5. However, to simplify the presentation, we replaced the enclosing *forall-new* operator by the *forall* operator to avoid the substitution of the *forall-new* operator by its encoding from Sect. 4.

1. We add the attribute dts = *xd*,α to all nodes/edges α of G<sup>H</sup> without dts attribute and add the attribute constraint <sup>x</sup>d,α <sup>=</sup> <sup>−</sup><sup>1</sup> to the set of constraints.

<sup>4</sup> For a graph H, alive(x, H) equals alive(x, S) for the disjoint union S of the nodes and edges of H. For a set S of nodes and edges, alive(x, S) equals ∪{alive(x, α) | α ∈ S}. For a node or an edge α, alive(x, α) equals {xc,α ≤ x, xd,α = −1 ∨ x<xd,α}.

With these additional attributes and the cts = *xc*,α attributes introduced by the operation <sup>F</sup>old, we are able to state the existence of nodes/edges at a given timepoint x<sup>n</sup> using attribute constraints in the resulting GC ψ .


In the following theorem (see [11] for its proof), we state that the operation <sup>R</sup>educe is sound w.r.t. the satisfaction relations for MTGCs and GCs.

Theorem 2 (Soundness of Operation <sup>R</sup>educe). *If* <sup>π</sup> <sup>∈</sup> <sup>Π</sup>*fin,* <sup>G</sup><sup>H</sup> <sup>=</sup> <sup>F</sup>old(π)*,* <sup>ψ</sup> <sup>∈</sup> <sup>Φ</sup>MTGC <sup>∅</sup> *,* <sup>t</sup> <sup>∈</sup> **<sup>R</sup>**<sup>0</sup> *is a timepoint,* <sup>i</sup><sup>G</sup>*<sup>H</sup>* : <sup>∅</sup> -−→ G<sup>H</sup> *is the initial morphism to* GH*, and* (G <sup>H</sup>, ψ ) = <sup>R</sup>educe(GH, t, ψ)*, then* (i<sup>G</sup>*<sup>H</sup>* , t) <sup>|</sup>=GH <sup>ψ</sup> *iff* G <sup>H</sup> <sup>|</sup><sup>=</sup> <sup>ψ</sup> *.*

By application of Theorem 2, we can deduce for our running example that the MTGC <sup>ψ</sup> from Fig. <sup>3</sup> translated by the operation <sup>R</sup>educe is satisfied by the graph G <sup>H</sup> (both given in Fig. 5). For this purpose observe that ψ from Fig. 3 (simplified as stated in Fig. 5) is satisfied by the GH from Fig. 4 for the timepoint t = 10 since the unique match of the Task node *T*, the on edge e1, and the System node *S* satisfies the remaining condition starting at timepoint t = 10.

## 7 Tool Support

We provide tool support for checking finite TGSs against MTGCs as an extension of AutoGraph [25]. Firstly, we extended the support of AutoGraph to handle TGSs and MTGCs. Secondly, we implemented the operation <sup>F</sup>old from Definition 8 to consolidate a TGS π to a GH GH. Thirdly, we implemented the operation <sup>R</sup>educe from Definition <sup>10</sup> to reduce an MTGC <sup>ψ</sup> to a GC <sup>ψ</sup> and to adapt G<sup>H</sup> to a graph G <sup>H</sup>. On the foundation of these three steps and as applications of our theoretical results (see Theorems 1 and 2), we then use the built-in support of AutoGraph for checking whether the obtained graph G H satisfies the reduced GC ψ . Note that AutoGraph depends in this scenario on the constraint solver Z3 [20] to check satisfiability of expressions involving the values of cts and dts attributes of sort real as well as the additional constraints introduced by <sup>R</sup>educe that contain further variables of sort real.

Considering our running example, we observed negligible runtime and memory consumption when verifying that the finite TGS π from Fig. 2 satisfies the MTGC ψ from Fig. 3 using our implementation due to the short length of π. Overall, the application of the AutoGraph extension *to our running example* shows promising results albeit the potential of further improvements regarding efficiency for handling more elaborate problem instances.

## 8 Conclusion and Future Work

We defined *Metric Temporal Graph Logic* (MTGL) by integrating the metric temporal operator *until* with time bounds into the well-established logic of (nested) graph conditions (GCs). This new logic allows to maintain an established binding of graph elements throughout the analysis of a timed sequence of (typed attributed) graphs (TGSs). Furthermore, to enable a satisfaction check for MTGL conditions by finite TGSs, we introduced a mapping of a finite TGS <sup>π</sup> into a graph with history <sup>G</sup><sup>H</sup> <sup>=</sup> <sup>F</sup>old(π) and defined a reduction of an MTGL condition ψ to a GC ψ given by (GH, ψ ) = <sup>R</sup>educe(GH, <sup>0</sup>, ψ) where the graph with history G<sup>H</sup> is extended to a graph G <sup>H</sup>. For this mapping and this reduction, we have proven that the satisfaction checks for the different representations are consistent (i.e., <sup>π</sup> <sup>|</sup>=TGS <sup>ψ</sup> ⇐⇒ <sup>G</sup><sup>H</sup> <sup>|</sup>=GH <sup>ψ</sup> ⇐⇒ <sup>G</sup> <sup>H</sup> <sup>|</sup><sup>=</sup> <sup>ψ</sup> ). Finally, we presented an extension of the tool AutoGraph allowing to check the satisfaction of MTGL conditions by finite TGSs via the introduced mapping and reduction.

In the future, we want to develop checking procedures bounded MTGL conditions such that only violations that hold for any possible continuation are reported. Moreover, we intend to use our reduction of MTGL conditions to related GC counterparts for invariant checking for graph transformation systems as considered in [7]. Furthermore, we want to develop extensions of MTGL that include branching such as in timed CTL, that are applicable to the setting of probabilistic timed graph transformation systems as introduced in [19], or that support additional features e.g. permitting variables in the interval bounds of MTGL conditions or in attribute constraints. Finally, we intend to develop a model checking procedure for MTGL and these extensions. Besides these technical advancements we intend to evaluate and compare our approach based on benchmarks from applications domains such as runtime monitoring [18].

## References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **KUPC: A Formal Tool for Modeling and Verifying Dynamic Updating of C Programs**

Jiaqi Qian<sup>1</sup>, Min Zhang1(B), Yi Wang<sup>2</sup>, and Kazuhiro Ogata<sup>3</sup>

<sup>1</sup> Shanghai Key Lab of Trustworthy Computing, ECNU, Shanghai, China zhangmin@sei.ecnu.edu.cn <sup>2</sup> GCCIS, Rochester Institute of Technology, Rochester, NY, USA

<sup>3</sup> Japan Advanced Institute of Science and Technology, Nomi, Japan

**Abstract.** Dynamic Software Updating (DSU) is a useful technique for updating running software without incurring any downtime. Its correctness must be guaranteed because updating a running software is a complicated and safety-critical process. In this paper, we present a formal tool called KupC for modeling and verifying dynamic updating of C programs. The tool is built on K–a formal semantic framework for programming languages. We formalize a patch-based dynamic updating mechanism in K based on the formal executable operational semantics of C. The formalization automatically yields an interpreter and several verification tools, which can be used to formally analyze the correctness of dynamic updating for C programs. To our knowledge, KupC is the first formal tool for code-level verification of dynamic software updating.

## **1 Introduction**

Software systems require frequent updating to fixate defects, improve performance, and add new features. For those systems providing 24 × 7 service commitment, Dynamic Software Updating (DSU) is a useful technique as it does not incur system downtime while updating [5]. Such systems are becoming prevalent with the diffusion of Internet of Things (IoT) and Cyber-Physical Systems (CPS), where additions, modifications, and removal of behaviors could be done in a quick and localized fashion. There is a comprehensive survey on DSU [10].

The difficulty of guaranteeing the correctness of dynamic updating is a fundamental barrier when we adopt this technique widely as expected. Correctness is crucial to those systems that need dynamic updating because they are usually safety-critical and highly-dependable. Meanwhile, dynamically updating a running software system is a complicated process, and it is difficult to predict

This work was supported by NSFC Project grants 61502171 and 61872146, and China HGJ Project under Grant 2017ZX01038102-002.

R. H¨ahnle and W. van der Aalst (Eds.): FASE 2019, LNCS 11424, pp. 299–305, 2019. https://doi.org/10.1007/978-3-030-16722-6\_17

all possible updating results. In order to update a program successfully while it is running in practice one has to know everything about that program [6]. However, it still lacks effective methodologies and tools to help understand all possible behaviors of running programs caused by updating.

Formal methods are rigorous approaches to program verification. Some attempts have been made on applying formal methods to DSU [3,4]. The existing approaches suffer one or more difficulties as follows. In some approaches formalizing a dynamic update may require abstraction of target programs. Such abstraction is usually done manually. It requires both formal methods expertise and human intellection to interpret target programs. Some approaches [1,11] lack tool support while developing such tools needs substantial efforts.

To mitigate the above difficulties, we present a formal tool called KupC for modeling and verifying dynamic updating of C programs in this paper. KupC is built upon the formalization of a DSU tool called Ginseng [8] for C programs. We formalize the updating strategy of Ginseng atop the operational semantics of C in the formal semantic framework called K [9]. From the formalization, K automatically generates several tools that can be used for formal analysis of dynamic updating of C programs. According to our knowledge, KupC is the first tool for the code-level formal verification of dynamic software updating.

KupC has the following three features. (1) KupC is focused on the codelevel verification of dynamic updating. It does not require any abstraction or transformation of target C programs that are subject to dynamic updating. (2) The verification functionalities of KupC are automatically generated from the formalization of dynamic updating mechanisms. No extra effort is needed on the implementation. (3) The formalization is built upon the operational semantics of the C language. One can easily develop similar tools for the formal analysis of dynamic updating of other languages such as Java and Python, whose operational semantics have already been formally defined in K.

# **2 KUPC Design**

**Patch-based DSU.** Many DSU tools achieve dynamic updating by injecting patches into running programs [10]. A patch contains all updating contents, e.g., new functions and data. Figure 1 (left) is an overview of the patch-based updating process. An old-version program is first made updatable by attaching additional version information, wrapping user-defined types, and inserting possible updating points. They are achieved by the two operations called *Dependants Updating* and *Restriction Generating*. Next, a patch file *p1.c* is generated and complied by comparing the differences between old and new programs. After an update request is invoked, a DSU tool checks whether it is safe to inject the compiled patch whenever the running program reaches a pre-specified updating point. Safety means that the behavior of the updated program is consistent with the expectation. It is guaranteed by the adopted updating policies in DSU tools.

**Fig. 1.** Patch-based dynamic updating and its formalization using <sup>K</sup>

If it is safe, the patch is injected and the running program state is transformed into the new version by a transformation function that is predefined in the patch. The patched program continues to execute from the new state. If updating at this point is not safe, the program continues to execute the old version.

It is worth mentioning that the entire updating process is atomically performed, that is, the execution keeps being suspended until the completion of the updating. Updating in an atomic manner is the most consistent approach that simplifies the updating process and reduces unexpected errors.

**The** K **Framework.** K [9] is a state-of-art semantic framework for programming languages. Many mainstream languages such as C and Java have been completely defined in K. One only needs to focus on the formalization of an updating mechanism using the pre-defined operational semantics of the targeted language. After formalizing the updating mechanism, K automatically generates several analysis tools such as program interpreter, state space explorer, and model checker.

**Formalization of dynamic updating strategy in** K**.** The basic idea of formalizing a dynamic updating mechanism using K is to formalize the functionalities of the mechanism on the basis of the operational semantics of the target programming language that the mechanism supports. The right part of Fig. 1 shows the formalization of the patch-based dynamic updating mechanism, consisting of the formalization of the five functionalities, respectively.

The functionalities of an updating mechanism are formalized by a set of rewrite rules. For instance, below is a rewrite rule that formalizes the function of checking the safety of updating a set of functions at an updating point *Loc*.

$$\begin{pmatrix} \text{TypeSafety} (Loc, \{ \frac{F}{\cdot} \} \\_) \cdot \cdot \\ \text{when} \begin{pmatrix} \text{'} \end{pmatrix} \cdot \begin{pmatrix} \text{'} \\ \text{'} \end{pmatrix} \cdot \begin{pmatrix} \text{'} \cdot \text{'} \\ \text{'} \cdot \text{'} \mapsto T \end{pmatrix} \begin{pmatrix} \text{'} \cdot \text{'} \\ F \text{'} \text{'} \mapsto T \end{pmatrix} \begin{pmatrix} \text{'} \\ \text{'} \end{pmatrix} \\ \text{when} \begin{pmatrix} \text{'} (F \in Re) \wedge (T == T') \end{pmatrix} \vee \begin{pmatrix} \text{'} \, \text{'} \notin Re \end{pmatrix} \end{pmatrix} \begin{pmatrix} \text{'} \, \text{'} \, \text{'} \, \text{'} \, \text{'} \, \text{'} \, \text{'} \, \text{'} \, \text{'} \, \text{'} \\ \text{'} \, \text{'} \, \text{'} \, \text{'} \, \text{'} \, \text{'} \, \text{'} \, \text{'} \, \text{'} \, \text{'} \, \text{'} \, \text{'} \, \text{'} \, \text{'} \, \text{'} \\ \text{'} \, \text{'} \, \text{'} \, \text{'} \, \text{'} \, \text{'} \, \text{'} \, \text{'} \, \text{'} \, \text{'} \, \text{'} \end{pmatrix}$$

**Fig. 2.** The snippets of old-version and new-version programs of a GPS application

In the rule, a pair of brackets is a labeled *cell*, representing a piece of program execution information. *<sup>F</sup>* · means *<sup>F</sup>* is deleted from the set if the conduction that follows the keyword *when* is true. The condition says that either *F* is updatable (represented by *F* -∈ *Re*) or it is un-updatable at the point *Loc* but its types *T* and *T* (before and after updating, respectively) are the same. Here, *Re* is the set of un-updatable contents at *Loc*. If the second argument of *TypeSafety* becomes an empty set, it means all the functions in the set are safe to update.

We totally defined 371 rewrite rules to formalize the updating mechanism of Ginseng. We tested the correctness of the rules using the example dynamic updating programs provided in Ginseng. These rules are seamlessly compiled by K together with the rules defined for the operational semantics of C [2]. The compilation yields the formal tool KupC which supports formal analysis of dynamic updating of C programs in various ways such as simulation, state exploration, and LTL model checking.

# **3 KUPC Usage**

KupC is equipped with an interpreter to *execute* updatable C programs, a state space explorer to search for all possible updating results, and an LTL model checker to verify temporal properties of dynamic updating. We demonstrate the usage of KupC using a dynamic updating to a GPS application. The tool, examples and a demo video are available https://github.com/dexter-qjq/KupC.

The program in Fig. 2 (left) is the old version of a GPS system. It calculates the shortest path. In the new version in Fig. 2 (right), the new program not only shows the shortest path, but also finds the most economic path. Three update points are inserted in function Query from Line 24 to Line 30.

**Fig. 3.** The shortest path before and after updating (Color figure online)

**Simulating a dynamic updating scenario.** Given an original C program annotated with update points, KupC can compile it with a patch file and generate binary code that is executable on K. During execution, updating is applied once reaching a safe updating point. It simulates the behavior of a dynamic updating to a program that is running on a real-world operating system.

Figure 3 shows the results of the simulation. Figures 3(a) and (b) show the original graph and the updated graph, respectively. When the update takes place at point1, the output of first call is the red path in Fig. 3(a). While the second call produces two paths as shown in Fig. 3(b). The red one is the shortest path and the green one is the most economic path.


**Fig. 4.** All possible updating results searched by the state space explorer of KupC

**Exploring all dynamic updating results.** In addition to simulating one possible updating scenario, KupC can search for all possible updating results by exploring each possible updating point using the state space explorer.

We compile and execute the program map with the option UPSEARCH=1 to invoke the state exploration function. Figure 4 shows all five different updating results. The outputs are divided into two parts by semicolon, representing the results of the two function calls of Query, respectively. Case 1 and Case 2 show the results when updating occurs at point1. Case 3 and Case 4 are for point3. Case 4 shows the result when updating is not performed.

While the dynamic updating occurs during the first call of the function Query at point3 in Case 3, the output of the first call is not affected by updating. The reason is that the updated content will not take effect until the next access after updating. Therefore, the outputs in Case 4 are exactly the same as the ones in Case 5. Updating at point2 violates the safety policies. Therefore, there is no case corresponding to point2. All the updating results searched are valid.

**Model checking temporal properties.** Dynamic updating is a temporal behavior in that the properties before and after updating may be different. Such differences can be formalized as temporal properties. Another attractive function of KupC is to verify these temporal properties using LTL model checking.

As an example, we verify whether or not updating in the GPS example can be finally deployed. First, we introduce an atomic proposition called \_\_update, which is false before updating and becomes true after the program is updated. Given the command UPLTLMC = "TrueLtl ULtl \_\_update" ./map, KupC returns true, indicating that updating can be eventually performed.

Another property of interest is that the shortest path must become 7 after the system is updated. It can be defined as an LTL formula \_\_update->(<>(x==7)), where variable *x* stores the value of the shortest path. Given the command UPLTLMC="'('~Ltl\_\_update'\'/Ltl'('TrueLtlULtl'('x==7')'')'')'"./map, KupC returns true, indicating that updating result is correct as expected.

## **4 Concluding Remarks and Ongoing Work**

We have presented the design and implementation of an operational semanticsbased verification tool called KupC for dynamic software updating. Three case studies showed the effectiveness of KupC for the formal analysis of the dynamic software updating of C programs by simulation, state exploration, and LTL model checking. Semantics-based formalization is promising in providing effective and practical solutions for guaranteeing the correctness of dynamic software updating. For instance, Lounas *et al.* achieved formal verification of dynamic updating of Java programs based on Java's semantics [7]. Compared with their approach, our approach is more general and extendable as K provides an elegant semantic framework for the definition of programming languages and an easy-to-use automated verification tool generation service.

KupC is at a good position for practical code-level verification of DSU. It is directly applicable to the code and shows the feasibility of formalizing a dynamic updating mechanism on the basis of the operational semantics of target programming languages. To verify the dynamic updating of more complex and practical programs, a complete semantics of C including those of standard libraries is needed. The efficiency of KupC also needs to examine although the efficiency of K has been validated [9]. There is ongoing work on these directions.

KupC has some limitations because of theoretical and practical challenges in the formal verification of DSU. Theoretically, Gutpa *et al.* have shown the undecidability of the reachability of updating points [3]. Another issue is that there is no uniform definition of *correctness* of dynamic updating. The logical correctness of dynamic updating depends on target programs and its formalization relies on programmers' interpretation. Although KupC does not require any abstraction of target programs, we suspect that certain abstraction is necessary for optimizing efficiency and scalability of the verification. For instance, a function that is not modified in a new version can be considered atomic for verification purpose. It is still an ongoing quest for an appropriate abstraction of target programs for the scalability while maintaining the validity of verification.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Business Process Privacy Analysis in PLEAK**

Aivo Toots1,2, Reedik Tuuling<sup>1</sup>, Maksym Yerokhin<sup>2</sup>, Marlon Dumas<sup>2</sup>, Luciano Garc´ıa-Ba˜nuelos<sup>2</sup>, Peeter Laud<sup>1</sup>, Raimundas Matuleviˇcius<sup>2</sup>, Alisa Pankova<sup>1</sup>, Martin Pettai<sup>1</sup>, Pille Pullonen1,2(B) , and Jake Tom<sup>2</sup>

<sup>1</sup> Cybernetica AS, Tallinn, Estonia *{*aivo.toots,reedik.tuuling,peeter.laud,alisa.pankova,martin.pettai, pille.pullonen*}*@cyber.ee <sup>2</sup> University of Tartu, Tartu, Estonia *{*aivo.toots,maksym.yerokhin,marlon.dumas,luciano.garcia-banuelos, raimundas.matulevicius,pille.pullonen,jake.tom*}*@ut.ee

**Abstract.** Pleak is a tool to capture and analyze privacy-enhanced business process models to characterize and quantify to what extent the outputs of a process leak information about its inputs. Pleak incorporates an extensible set of analysis plugins, which enable users to inspect potential leakages at multiple levels of detail.

## **1 Introduction**

Data minimization is a core tenet of the European General Data Protection Regulation (GDPR) [2]. According to GDPR, usage of private data should be limited to the purpose for which it has been collected. To verify compliance with this principle, privacy analysts need to determine who has access to the data and what private information these data may disclose. Business process models are a rich source of metadata to support this analysis. Indeed, these models capture which tasks are performed by whom, what data are taken as input and output by each task, and what data are exchanged with external actors. Process models are usually captured using the Business Process Model and Notation (BPMN).

This paper introduces Pleak<sup>1</sup> – the first tool to analyze privacy-enhanced BPMN models in order to characterize and quantify to what extent the outputs of a process leak information about its inputs. The top level (Boolean level, Sect. 2), tell us whether or not a given data in the process may reveal information about a given input. The middle level, the qualitative level (Sect. 3), goes further by indicating which attributes of (or functions over) a given input data object are potentially leaked by each output, and under what conditions this leakage may occur. The lower level quantifies to what extent a given output leaks information about an input, either in terms of a sensitivity measure (Sect. 4) or in terms of the guessing advantage that an attacker gains by having the output (Sect. 5).

<sup>1</sup> https://pleak.io (account: *demo@example.com*, password: *pleakdemo*, manual: https://pleak.io/wiki/, source code: https://github.com/pleak-tools/).

c The Author(s) 2019

R. H¨ahnle and W. van der Aalst (Eds.): FASE 2019, LNCS 11424, pp. 306–312, 2019. https://doi.org/10.1007/978-3-030-16722-6\_18

**Fig. 1.** Aid distribution process

To illustrate the capabilities of Pleak, we refer to an "aid distribution" process in Fig. 1. This process starts when a nation requests aid from the international community to handle an emergency and a country offers to route a ship to help transport people and/or goods. The goal of the process is to allocate a port and a berth to the ship but not to reveal information about ships that are unable to help or the parameters of the ports. The process uses a type of privacy-enhancing technology (PET) known as secure multiparty computation (MPC). MPC allows participants to perform joint computations such that none of the parties gets to see the data of the other parties, but can learn the output depending on the private inputs. Given a ship, a deadline and the list of ports, task "Compute reachable ports" retrieves the list of ports reachable by the deadline. Tasks with identical names in different pools denote MPC computations carried out jointly by multiple stakeholders. Task "Select feasible ports" retrieves ports with the capacity to host the ship. The third task selects a port, a berth, and a slot for the ship, and discloses them to both participants.

*Related Work.* We are interested in privacy analysis of business processes and in this space Anica [1] is closest to our work. However, Pleak's analysis is more fine-grained. Anica allows designers to see that a given object O1 may contain information derived from a sensitive data object O2, but it can neither explain how the data in O2 is derived from O1 (cf. Leaks-When analysis) nor to what extent the data in O2 leaks information from O1 (cf. sensitivity and guessing advantage analysis). In addition, they are interested in security levels and our high level analysis looks at PETs deployed in the process.

## **2 PE-BPMN Editor and Simple Disclosure Analysis**

The model in Fig. 1 is captured Privacy-Enhanced BPMN (PE-BPMN) [7,8]. PE-BPMN uses stereotypes to distinguish used PETs, e.g. MPC or homomorphic encryption, that affect which data is protected in the process. The PE-BPMN editor allows users to attach stereotypes to model elements and to enter the stereotype's parameters where applicable. The editor integrates a checker, which verifies stereotype specific restrictions. For example, that: (1) when a task has an MPC stereotype, there is at least one other "twin" task with the same label in another pool, since an MPC computation involves at least two parties; (2) when one of these tasks is enabled, the other twin tasks is eventually enabled; and (3) the joint computation has at least one input and one output.

Given a valid PE-BPMN model, Pleak runs a binary privacy analysis, which produces a *simple disclosure report* and data dependency matrix. The disclosure report in Fig. 2 tells us whether or not a stakeholder gets to see a given data object. In the report "V" indicates that a data object (in columns) is visible to a stakeholder (in rows). Marker "H" (hidden) is used for data with cryptographic protection, e.g. encrypted data. Row "shared over" refers to the network service provider, who may also see some of the data (e.g. unencrypted data objects).


**Fig. 2.** Simple disclosure report for the aid distribution process in Fig. <sup>1</sup>

## **3 Qualitative Leaks-When Analysis**

Leaks-When analysis [3] is a technique that takes as input a SQL workflow and determines, for each (output, input) pair which attributes, if any, of the input object are disclosed by the output object and under which conditions. A SQL workflow is a BPMN process model in which every data object corresponds to a database table, defined by a table schema, and every task is a SQL query that transforms the input tables of the task into its output tables. Figure 3 shows a sample collaborative SQL workflow – a variant of the "aid distribution" example where the disclosure of information about ships to the aid-requesting country is made incrementally. The figure shows the SQL workflow alongside the query corresponding to task "Select reachable ports". All data processing tasks and input data objects are specified analogously.

To perform a Leaks-When analysis, the user selects one or more output data objects and clicks the "SQL LeaksWhen" button. The Leaks-When analysis shows one tab for each output data object and one report for each column in the output table. The report is generated by extracting all runs of the workflow and applying dataflow analysis techniques to each run in order to infer all relevant data dependencies. An example of a leaks-when report (in graphical form) is shown in Fig. 4. The first input to *Filter* is the disclosed value (leaks branch), e.g. the arrival time. The second input (when branch) is the condition of outputting the first input, e.g. that the arrival time is less than the deadline and the ship has the required name. Each Leaks-When report ends with such filter but the rest of the graph aggregates the computations described in SQL.

**Fig. 3.** Aid distribution SQL workflow in Pleak SQL editor

## **4 Sensitivity Analysis and Differential Privacy**

The *sensitivity of a function* is the expected maximum change in the output, given a change in the input of the function. Sensitivity is the basis for calibrating the amount of noise to be added to prevent leakages on statistical database queries using a differential privacy mechanism [6]. Differential privacy ensures that it is difficult for an attacker, who observes the query output, to distinguish between two input databases that are sufficiently "close" to each other, e.g. differ

**Fig. 4.** Sample leaks-when report

in one row. Pleak tells the user how to sample noise to achieve differential privacy, and how this affects the correctness of the output. Pleak provides two methods – global and local – to quantify sensitivity of a task in a SQL workflow or of an entire SQL workflow. These methods can be applied to queries that output aggregations (e.g. count, sum, min, max).

*Global sensitivity* analysis [5] takes as input a database schema and a query, and computes the theoretical bounds for sensitivity, which are suitable for any instance of the database. This shows how the output changes if we add (remove) a row to (from) some input table. The analysis output is a matrix that shows the sensitivity w.r.t. each input table separately. It supports only COUNT queries.

Sometimes, the global sensitivity may be very large or even infinite. *Local sensitivity* analysis is an alternative approach, which requires as input not only a schema and a query, but also a particular instance of the underlying database, and it tells how the output changes with the change *from the given input*. Using the database instance improves the amount of noise needed to ensure differential privacy w.r.t. the number of rows. Moreover, it supports COUNT, SUM, MIN, MAX aggregations, and allows to capture more interesting distances between input tables, such as change in a particular attribute of some row. In Pleak, we have investigated a particular type of local sensitivity, called *derivative sensitivity* [4], which is in first place adapted to continuous functions, and is closely related to function derivative. Pleak uses derivative sensitivity to quantify the required amount of noise as described in [4].

An example of derivative sensitivity analysis output is shown in Fig. 5a. It tells that the derivative sensitivity w.r.t. the *Ship* table is 4, and that a differential privacy level of ε = 1 can be achieved using smoothness parameter β = 0.05. To this end, we would have to add an amount of (Laplacian) noise such that the relative error of the output is 74%. More precisely, if the correct output is y, the noised answer will be between 0.26y and 1.74y with probability 80%. A tutorial on sensitivity analyzer can be found at https://pleak.io/wiki/sqlderivative-sensitivity-analyser. More examples can be found in the full version of this paper [9].

**Fig. 5.** Examples of quantitative analysis

## **5 Attacker's Guessing Advantage**

While function sensitivity as defined in Sect. 4 can be used directly to compute the noise required to achieve ε-differential privacy, it is in general not clear which ε is good enough, and the "goodness" depends on the data and the query [6]. We want a more standard security measure, such as guessing advantage, defined as the difference between the posterior (after observing the output) and prior (before observing the output) probabilities of attacker guessing the input.

The *guessing advantage* analysis of PLEAK takes as input the desired upper bound on attacker's advantage, which ranges between 0% and 100%. The user specifies particular subset of attributes that the attacker is trying to guess for some data table record, within given precision range. The user may define prior knowledge of the attacker, which is currently expressed as an upper and a lower bound on an attribute. The analyzer internally converts these values to a suitable ε, and computes the noise required to achieve the bound on attacker's advantage.

Figure 5b shows an example parameters and output of this analysis. The attacker already knows that the longitude and latitude of a ship are in the range [0...300] while the speed is in [20...80]. His goal is to learn the location of any ship with a precision of 5 units. If we want to bound the guessing advantage by 30% using differential privacy, the relative error of the output will be 43.25%. For a tutorial see https://pleak.io/wiki/sql-guessing-advantage-analyser.

**Acknowledgements.** The research was funded by Estonian Research Council under IUT27-1 and IUT20-55 and by the Air Force Research laboratory (AFRL) and Defense Advanced Research Projects Agency (DARPA) under contract FA8750-16-C-0011. The views expressed are those of the authors and do not reflect the official policy or position of the Department of Defense or the U.S. Government.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Specification, Design, and Implementation of Particular Classes of Systems

## **CLTestCheck: Measuring Test Effectiveness for GPU Kernels**

Chao Peng(B) and Ajitha Rajan

University of Edinburgh, Edinburgh, UK {chao.peng,arajan}@ed.ac.uk

**Abstract.** Massive parallelism, and energy efficiency of GPUs, along with advances in their programmability with OpenCL and CUDA programming models have made them attractive for general-purpose computations across many application domains. Techniques for testing GPU kernels have emerged recently to aid the construction of correct GPU software. However, there exists no means of measuring quality and effectiveness of tests developed for GPU kernels. Traditional coverage criteria over CPU programs is not adequate over GPU kernels as it uses a completely different programming model and the faults encountered may be specific to the GPU architecture.

We address this need in this paper and present a framework, CLTestCheck, for assessing quality of test suites developed for OpenCL kernels. The framework has the following capabilities, 1. Measures kernel code coverage using three different coverage metrics that are inspired by faults found in real kernel code, 2. Seeds different types of faults in kernel code and measures fault finding capability of test suite, 3. Simulates different work-group schedules to check for potential deadlocks and data races with a given test suite. We conducted empirical evaluation of CLTestCheck on a collection of 82 publicly available GPU kernels and test suites. We found that CLTestCheck is capable of automatically measuring effectiveness of test suites, in terms of kernel code coverage, fault finding and revealing data races in real OpenCL kernels.

**Keywords:** Testing · Code coverage · Fault finding · Data race · Mutation testing · GPU · OpenCL

## **1 Introduction**

Recent advances in the programmability of Graphics Processing Units (GPUs), accompanied by the advantages of massive parallelism and energy efficiency, have made them attractive for general-purpose computations across many application domains [19]. However, writing correct GPU programs is a challenge owing to many reasons [13] – a program may spawn millions of threads, which are clustered in multi-level hierarchies, making it difficult to analyse; programmer assumes responsibility for ensuring concurrently executing threads do not conflict by checking threads access disjoint parts of memory; complex striding patterns of memory accesses are hard to reason about; GPU work-group execution model and thread scheduling vary platform to platform and the assumptions are not explicit. As a consequence of these factors, GPU programs are difficult to analyse with existing static or dynamic approaches [13]. Static techniques are thwarted by the complexity of the sharing patterns. Dynamic techniques are challenged by the combinatorial explosion of thread interleavings and space of possible data inputs. Given these difficulties, it becomes important to understand the extent to which a GPU program has been analysed and tested, and the code portions that may need further attention.

In this paper, we focus on GPU program testing and address concerns with respect to quality and adequacy of tests developed for GPU programs. We present a framework, CLTestCheck, that measures test effectiveness over GPU kernels written using OpenCL programming model [7]. The framework has three main capabilities. The first capability is a technique called *schedule amplification* to check execution of test inputs over several work-group schedules. Existing GPU architecture and simulators do not provide a means to control work-group schedules. The OpenCL specification provides no execution model for inter workgroup interactions [21]. As a result, the ordering of work-groups when a kernel is launched is non-deterministic and there is, presently, no means for checking the effect of schedules on test execution. We provide this monitoring capability. For a test case *T<sup>i</sup>* in test suite *T S*, instead of simply executing it once with an arbitrary schedule of work-groups, we execute it many times with a different work-group schedule in each execution. We build a simulator that can force work-groups in a kernel execution to execute in a certain order. This is done in an attempt to reveal test executions that produce different outputs for different work-group schedules which inevitably point to problems in inter work-group interactions.

The second capability of CLTestCheck is measuring code coverage for OpenCL kernels. The structures we chose to cover were motivated by OpenCL bugs found in public repositories like Github and research papers for GPU testing. We define and measure coverage over synchronisation statements, loop boundaries and branches in OpenCL kernels.

The final capability of the framework is creating mutations by seeding different classes of faults relevant to GPU kernels. We assess the effectiveness of test suites in uncovering the seeded faults.

We empirically evaluate CLTestCheck using 82 kernels and associated test input workloads from industry standard benchmarks. The schedule amplifier in CLTestCheck was able to detect deadlocks and inter work-group data races in benchmarks. We were able to detect barrier divergence and kernel code that requires further tests using the coverage measurement capabilities of CLTestCheck. Finally, the fault seeding capability was able to expose unnecessary barriers and unsafe accesses in loops.

The CLTestCheck framework aims to help developers assess how well the OpenCL kernels have been tested, kernel regions that require further testing, uncover bugs sensitive to work-group schedules. In summary, the main contributions in this paper are:


The rest of this paper is organised as follows. We present background on the OpenCL programming model in Sect. 2. Related work in GPU program testing and verification is discussed in Sect. 3. CLTestCheck capabilities is discussed in Sect. 4. Experiment setup and results of our empirical evaluation is discussed in Sects. 5 and 6, respectively.

## **2 Background**

The success of GPUs in the past few years has been due to the ease of programming using the CUDA [17] and OpenCL [7] parallel programming models, which abstract away details of the architecture. In these programming models, the developer uses a C-like programming language to implement algorithms. The parallelism in those algorithms has to be exposed explicitly. We now present a brief overview of the core concepts of OpenCL, the programming model used in this paper.

OpenCL is a programming framework and standard set from Khronos, for heterogeneous parallel computing on cross-vendor and cross-platform hardware. In the OpenCL architecture, CPU-based *Host* controls multiple *Compute Devices* (for instance CPUs and GPUs are different compute devices). Each of these coarse grained compute devices consists of multiple *Compute Units* which in turn contain one or more *processing elements* (a.k.a *streaming processors*). The processing elements execute groups of individual threads, referred to as workgroups, concurrently. The functions executed by the GPU threads are called *kernels*, parameterised by thread and group id variables. OpenCL has four types of memory regions: global and constant memory shared by all threads in all work-groups, local memory shared by threads within the same work-group and private memory for each thread. Kernels cannot write to the constant memory.

GPUs have SIMT (single instruction, multiple thread) execution model that executes batches of threads (warps) in *lock-step*, i.e all threads in a work-group execute the same instruction but on different data. If the control flow of threads within the same work-group diverges, the different execution paths are scheduled sequentially until the control flows reconverge and lock-step execution resumes. Sequential scheduling caused by divergence results in a performance penalty, slowing down execution of the kernel.

Betts et al. [2] describe two specific classes of bugs that make GPU kernels harder for verification than sequential code, data races and barrier divergence. *Inter work-group data race* is referred to as a global memory location is written by one or more threads from one work-group and accessed by one or more threads from another work-group. *Intra work-group data race* is referred to as a global or local memory location is written by one thread and accessed by another from the same work-group. Barrier is a synchronisation mechanism for threads within a work-group in OpenCL and is used to prevent intra work-group data race errors. *Barrier divergence* occurs if threads in the same group reach different barriers, in which case kernel behaviour is undefined [2] and may lead to intra work-group data race.

In this paper, we focus on covering barrier functions to help detect intra work-group barrier divergence errors and revealing problems with inter workgroup interactions using work-group schedule amplification.

## **3 Related Work**

We discuss related work in the context of work-group synchronisation, verification and testing of GPU programs.

*Inter Work-group Synchronisation for OpenCL Kernels.* Barrier functions in the OpenCL specification [7] help synchronise threads within the same work-group. There is no mechanism, however, to synchronise threads belonging to different work-groups. One solution for this problem is to split a program into multiple kernels with the CPU executing the kernels in sequence providing implicit synchronisation. The drawback with this method is the overhead incurred in launching multiple kernels. Xiao et al. [24] proposed an implementation of inter work-group barrier that relies on information on the number of work-groups. This method is not portable as the number of launched work-groups depends on the device. Sorensen et al. [22] extended it to be portable by discovering workgroup occupancy dynamically. Their implementation of inter work-group barrier synchronisation is useful when the developer knows there is interaction between work-groups that needs to be synchronised. Our contribution is in detecting undesired inter work-group interactions, not intended by the developer.

*GPU Kernel Verification.* Verification of GPU kernels to detect data races and barrier divergence bugs has been explored in the past. Li et al. [14] introduced a Satisfiability Modulo Theories (SMT) based approach for analysing GPU kernels and developed a tool called Prover of User GPU (PUG). The main drawback of this approach is scalability. With an increasing number of threads, the number of possible thread interleavings grows exponentially, making the analysis infeasible for large number of threads. GRace [25] and GMRace [26] were developed for CUDA programs to detect data races using both static and dynamic analysis. However, they do not support detection of inter work-group data races.

GKLEE [15] and KLEE-CL [3], based on dynamic symbolic execution, provides data race checks for CUDA and OpenCL kernels, respectively. Both tools are restricted by the need to specify a certain number of threads, and the lack of support for custom synchronisation constructs. Scalability and general applicability is a challenge with these tools.

Leung et al. [13] present a flow-based test amplification technique for verifying race freedom and determinism of CUDA kernels. For a single test input under a particular thread interleaving, they log the behaviour of the kernel and check the property. They then amplify the result of the test to hold over all the inputs that have the same values for the property integrity-inputs. The test amplification approach in [13] can check the absence of data-races, not the presence. Additionally, their approach amplifies across the space of test inputs, not work-group schedules as done in our schedule amplifier. GPUVerify [2] is a static analysis tool that transforms a parallel GPU kernel into a two-threaded predicated program with lock-step execution and checks data races over this transformed model. The drawback of GPUVerify is that it may report false alarms and has limited support for atomic operations.

*Test Effectiveness Measurement.* Measuring effectiveness of tests in terms of code coverage and fault finding is common for CPU programs [6,18]. Support for GPU programs is scarce. GKLEE is the only tool that provides support for code coverage for CUDA GPU kernels. Given a kernel, it converts it into its sequential C program version (using Perl scripts) and applies the Gcov utility supplied with GCC for measuring code coverage. This form of coverage measurement disregards the GPU programming model. In our approach, we measure coverage conforming to the OpenCL programming model. With respect to fault seeder and schedule amplification, we are not aware of any existing work that provides these capabilities for GPU kernels to help measure effectiveness of test suites. The CLTestCheck framework is discussed in the next Sect. 4.

## **4 Our Approach**

In this Section, we present the CLTestCheck framework that provides capabilities for kernel code coverage measurement, mutant generation and schedule amplification. To understand the kinds of programming bugs<sup>1</sup> encountered by OpenCL developers, we surveyed several publicly available OpenCL kernels and associated bug fix commits. A summary of our findings is shown in Table 1. We found bugs most commonly occur in the following OpenCL code constructs: barriers, loops, branches, global memory accesses and arithmetic computations. We seek to aid the developer in assessing quality of test suites in revealing these bug types using CLTestCheck. A detailed discussion of CLTestCheck capabilities is presented in the following sections.

#### **4.1 Kernel Code Coverage**

We define coverage over barriers, loops and branches in OpenCL code to check rigour of test suites in exercising these code structures.

*Branch Coverage.* GPU programs are highly parallelised, executed by numerous processing elements, each of them executing groups of threads in lock step, which is very different from parallelism in CPU programs, where each thread executes different instructions with no implicit synchronisation, as seen in lock-step execution. Kernel code for all the threads is the same, however, the threads may diverge, following different branches based on the input data they process. As seen in Table 1, uncovered branches and branch conditions are an important class of OpenCL bugs. Lidbury et al. [16] report in their work that branch coverage

<sup>1</sup> These are kernel bugs that violate the specification of the program or are associated with executions that lead to undefined behaviour.


**Table 1.** Summary of bug fixing commits we collected

measurement is crucial for GPU programs but is currently lacking. To address this need, we define branch coverage for GPU programs as follows,

$$branch\,\,coverage = \frac{\#covered\,\, branches}{total\,\,\#branch} \times 100\% \tag{1}$$

*Branch coverage* measures adequacy of a test suite by checking if each branch of each control structure in GPU code has been executed by at least one thread.

*Loop Boundary Coverage.* In our survey of kernel bugs shown in Table 1, we found bugs related to loop boundary values and loop conditions were fairly common. For instance, bug #3 found in the mcxcl program allowed the loop index to access memory locations beyond the end of the array due to an erroneous loop condition. We assess adequacy of test executions with respect to loops by considering the following cases,


$$Loop\ boundary\ coverage\_{case\\_i} = \frac{\#loops\ satisfying\ case\\_i}{total\ \#loops} \times 100\% \quad \text{(2)}$$

where *case<sup>i</sup>* refers to one of the four loop execution cases listed above.

*Barrier Coverage.* Barrier divergence occurs when the number of threads within a work-group executing a barrier is not the same as the total number of threads in that work-group. Kernel behaviour with barrier divergence is undefined. Barrier related bugs, missing barriers and unnecessary barriers, is a common class of GPU bugs according to our survey. We define barrier coverage as follows.

$$bariier\ coverage = \frac{\#covered\ barriers}{total\ \#barrier} \times 100\% \tag{3}$$

*Barrier coverage* measures adequacy of a test suite by checking if each barrier in GPU code is executed correctly. Correct execution of a barrier without barrier divergence, *covered barrier*, is when it is executed by *all* threads in any given work-group.

#### **4.2 Fault Seeding**

Mutation testing is known to be an effective means of estimating the fault finding effectiveness of test suites for CPU programs [9]. We generate mutations using traditional mutant operators, namely, arithmetic, relational, bitwise, logical and assignment operator types. In Table 1, bug fixes #3, #7 and #8 show that traditional arithmetic and relational operator mutations remain applicable to GPU programs. In addition, we define three mutations specifically for OpenCL kernels: barrier mutation, image access mutation and loop boundary mutation inspired by bug fixes #1 to #5.

The barrier mutation operator we define is deletion of an existing barrier function call, to reproduce bugs similar to #1 and #2 in Table 1. OpenCL provides 2D and 3D image data structures to facilitate access to images. Multidimensional arrays are not supported in OpenCL. Image structures are accessed using read and write functions that take the pixel coordinates in the image as parameter. We perform image access mutations for 2D or 3D coordinates by increasing or decreasing one of the coordinates or exchanging coordinates. Finally, we define loop boundary mutations as either (1) skipping the loop, (2) allowing n-1 iterations of the loop and (3) allowing n+1 iterations of the loop where n is the number of iterations when the loop boundary is reached. The mutant operators we use in this paper are summarised in Table 2.


**Table 2.** Summary of mutation operators

## **4.3 Schedule Amplification**

When a kernel execution is launched the GPU schedules work-groups on compute units in a certain order. Presently, there is no provision for determining this schedule or setting it in advance. The scheduler makes the decision on the fly subject to availability of compute units and readiness of work-groups for execution. The order in which work-groups are executed with the same test input can differ every time the kernel is executed. OpenCL specification has no execution model for inter work-group interactions and provides no guarantees on how work-groups are mapped to compute units. In our approach, we execute each test input over a set of schedules. In each schedule, we fix the work-group that should execute first. All other work-groups wait till it has finished execution. The work-group going first is picked so that we achieve a uniform distribution over the entire range of work-groups in the set of schedules. The order of execution for the remaining work-groups is left to the scheduler. For a test case, *T* over a kernel with *G* work-groups, we will generate *N* schedules, with *N<G*, such that a different work-group is executed first in each of the *N* schedules. The number of schedules, *N*, we generate is much lesser than the total number of schedules which is typically infeasible to check. The reason we only fix the first work-group in the schedule is because, most data races or deadlocks involve interactions between two work-groups. Fixing one of them and picking a different work-group each time, significantly reduces the search space of possible schedules. We cannot provide guarantees with this approach. However, with little extra cost we are able to check significantly more number of schedules than is currently possible. We believe this approach will be effective in revealing issues, if any, in inter work-group interactions.

To illustrate this, we consider a kernel co running on four work-groups. The CLTestCheck schedule amplifier will insert code on the host and GPU side, shown in Listings 1.1 and 1.2, to generate different work-group schedules.

In this example, before the GPU kernel is launched, the host side generates a random value in the range of available work-group ids. This value is the id of the selected work-group to be executed first and is passed to the kernel code using a macro definition. On the kernel side, each thread determines if it belongs to the selected work-group. Threads in the selected work-group proceed with executing the kernel code while threads belonging to other work-groups wait. After the selected work-group completes execution, the remaining work-groups execute the original kernel in an order based on mapping to available compute units (occupancy bound execution model [22]). With different work-group schedules generated by the schedule amplifier, we were able to detect the presence of *inter* work-group data races using a *single* GPU platform. Betts et al. [2], on the other hand, focus on intra work-group data races on different GPU platforms.

## **4.4 Implementation**

CLTestCheck is implemented using Clang LibTooling [12]. We instrument OpenCL kernel source code to measure coverage, generate mutations and multiple work-group schedules automatically. Our implementation is available at https://github.com/chao-peng/CLTestCheck.

**Coverage Measurement.** To record branches, loops and barriers executed within each kernel when running tests, we instrument the kernel code with data structures and statements recording the execution of these code structures. For each work-group, we introduce three local arrays, whose size is determined by the number of branches, loops and barriers accessible by threads in that work-group. To measure branch coverage, we add statements at the beginning of each thenand else-branch to record whether that branch is enabled. Similarly, statements to record the number of iterations of loops are added at the beginning of each loop body. At the end of the kernel, the information contained in the data structures is processed to compute coverage.

**Fault Seeder and Mutant Execution.** The CLTestCheck fault seeder generates mutants and executes them with each of the tests in the test suite to compute mutation score, as the fraction of mutants killed. The CLTestCheck fault seeder translates the target kernel source code into an intermediate form where all the applicable operators are replaced by a template string containing the original operator, its ID and type. The tool then generates mutants from this intermediate form. Once mutants are generated, the tool executes each of the mutant files and checks if the test suite kills the mutant. We term the mutant as killed if one of the following occurs: program crashes, deadlocks or produces a result different from the original kernel code.

**Schedule Amplification.** As mentioned earlier, we generate several schedules for each test execution by requiring a target work-group to execute the kernel code first and then allowing other work-groups to proceed. The target workgroup is selected uniformly across the input space of work-group ids. To achieve coverage of this input space, we partition work-group ids into sets of 10 workgroups. Thus if we have *N* work-groups, we partition them into *N* /10 sets. The first set has work-group ids 0 to 9, the second set has ids 10 to 19 and so on. We then randomly pick a target work-group, *Wt*, from each of these sets to go first and generate a corresponding schedule of work-groups, {*Wt, S<sup>N</sup>*−<sup>1</sup>}, where *S<sup>N</sup>*−<sup>1</sup> refers to the schedule of remaining *N* − 1 work-groups generated by the GPU execution model which is non-deterministic. For *N* /10 sets of work-groups, we will have *N* /10 schedules of the form {*Wt, S<sup>N</sup>*−<sup>1</sup>} (a *W<sup>t</sup>* first schedule). The test input is executed using each of these *N* /10 *W<sup>t</sup>* first schedules. Due to the non-deterministic nature of *SN*−<sup>1</sup>, we repeat the test execution with a chosen *W<sup>t</sup>* first schedule 20 times. This will enable us to check if the execution model generates different *SN*−<sup>1</sup> and evaluate executions with 20 such orderings.

## **5 Experiment**

In our experiment, we evaluate the feasibility and effectiveness of the coverage metrics, fault seeder and work-group schedule amplifier proposed in Sect. 4 using OpenCL kernels from industry standard benchmark families and their associated test suites. We investigate the following questions:


For each benchmark, we generate all possible mutants by analysing the kernel source code and applying the mutation operators, discussed in Sect. 4, to eligible locations. We then assess number of mutants killed by the tests associated with each benchmark. To check if a mutant is killed, we compared execution results between the original program and mutant.

**Q3. Deadlocks and Data Races:** *Can the tests in the test suite give rise to unusual behaviour in the form of deadlocks or data races?* Deadlocks occur when two or more work-groups are waiting on each other for a resource. Inter work-group data races occur when test executions produce different outputs for different work-group schedules. For each test execution in each benchmark, we generate 20 ∗ *N/*10 different work-group schedules, where *N* is total number of work-groups for the kernel, and check if the outputs from the execution change based on work-group schedule.

*Subject Programs.* We used the following benchmarks for our experiments, 1. Nine scientific benchmarks with 23 OpenCL kernels from Parboil benchmark suite [23], 2. scan benchmark [20], with 3 kernels, that computes parallel prefix sum, 3. Five applications containing 13 kernels from Rodinia benchmark suite for heterogeneous computing, 4. 20 benchmarks from PolyBench with 43 kernels spanning linear algebra, data mining and stencil computations.

We ran our experiments on Intel CPU (i5-6500) and GPU (HD Graphics 530) using OpenCL SDK 2.0.

## **6 Results and Analysis**

For each of the subject programs presented in Sect. 5, we ran the associated test suites and report results in terms of coverage achieved, fault finding and overhead incurred with CLTestCheck framework. We executed the test suites 20 times for each measurement. Our results in the context of the questions in Sect. 5 is presented below.

#### **6.1 Coverage Achieved**

Branch and Loop coverage (with 0, exactly 1 and *>*1 iterations) for each of the subject programs in the three benchmark suites<sup>2</sup> is shown in the plots in Fig. 1. The first row shows branch coverage, the second loop coverage. Mutation score and surviving mutation types shown in the last two rows of Fig. 1 is discussed in the next Sect. 6.2.

**Fig. 1.** Coverage achieved - Branch and Loop, mutation score and percentage of surviving mutations by type for each subject program in the 3 benchmark suites.

<sup>2</sup> 20 applications in Parboil counting different test suites separately, 6 in Scan/Rodinia, and 20 in PolyBench.

**Barrier Coverage** is not shown in the plots since for all, except one, applications with barriers, the associated test suites achieved 100% barrier coverage. The only subject program with less than 100% barrier coverage was scan, which had 87*.*5% barrier coverage. The uncovered barrier is in a loop whose condition does not allow some threads to enter the loop, resulting in barrier divergence between threads. We find that less than 100% barrier coverage is a useful indicator of barrier divergence in code.

**Branch Coverage.** For most subject programs in Parboil and Scan/Rodinia, test suites achieve high branch coverage (*>*83%). The histo benchmark is an outlier with a low branch coverage of 31.6%. Its kernel function, histo main, contains 20 branches in a code block handling an exception condition (overflow). The test suite provided with histo does not raise the overflow exception, and as a result, these branches are never executed. We found uncovered branches in other applications, with *>*80% coverage, in Parboil and Scan/Rodinia to also result from exception handing code that is not exercised by the associated test data.

Branch coverage achieved for 13 of the 20 applications in PolyBench is at 50%. This is very low compared with other benchmark suites. Upon investigating the kernel code, we found that all the uncovered branches reside within a condition check for out of range array index. Tests associated with a majority of the applications did not check out of range array index access, resulting in low branch coverage.

**Loop Coverage.** Test suites for nearly all applications (with loops) execute loops more than once. Thus, coverage for *>*1 iterations is 100% for all but one of the applications, srad in Rodinia suite, that has 80%. The uncovered loop in srad is in an uncovered then-branch that checks exception conditions. We also checked if the boundary value in loop conditions is reached when *>*1 iterations is covered by test executions. We found pathfinder in Rodinia to be the only application to have full coverage for *>*1 iterations but not reach the boundary value. The unusual scenario in pathfinder is because one of the loops is exited using a break statement.

We find that test suites for most applications are unable to achieve any loop coverage for 0 and exactly 1 iteration. The boundary condition for most loops is based on the size of the work-groups which is typically much greater than 1. As a result, test suites have been unable skip the loop or execute it exactly once. The only exceptions were applications in the Parboil suite - bfs, cutcp, mri-gridding, spmv, and two applications in Rodinia - lud, srad, that have boundary values dependent on variables that maybe set to 0 or 1.

**Overhead.** For each benchmark and associated test suite, we assessed overhead introduced by our approach. We compared time needed for executing the benchmark with instrumentation and additional data structures that we introduced for coverage measurement against the original unchanged benchmark. Overhead varied greatly across benchmarks and test suites. Overhead for Parboil and Rodinia benchmarks was in the range of 2% to 118%. Overhead was lower for benchmarks that took longer to execute as the additional execution time from instrumentation is a smaller fraction of the overall time. Overhead for most programs in PolyBench ranges from 2% to 70%, which is similar to Parboil and Rodinia benchmarks. The overhead for lu, fdtd-2d and jacobi-2d-imper programs are *>*100%. The code for kernel computations in these benchmarks is small with fast execution. Consequently, the relative increase in code size and execution time after instrumentation with CLTestCheck is high.

#### **6.2 Fault Finding**

Fault finding for the subject programs is assessed using the mutants we generate with the fault seeder, described in Sect. 4. The mutation score, percentage of mutants killed, is used to estimate fault finding capability of test suites associated with the subject programs. Each test suite associated with a benchmark is run 20 times to determine the killed mutants. A mutant is considered killed if the test suite generates different outputs on the mutant than the original program in *all* 20 repeated runs of the test suite. In addition to killed mutants, we also report results on "Undecided Mutants", that refers to mutants that are killed in at least one of the executions of the test suite, but *not all* 20 repeated executions. Changes in GPU thread scheduling between runs causes this uncertainty. We do not count the undecided mutants towards killed mutants in the mutation score. Mutation score for all subject programs in each benchmark suite is shown in the third row of plots in Fig. 1.

**Mutation Score.** In general, we find that test suites for subject programs achieving high branch, barrier and loop coverage also have high mutation score. For instance, for spmv and stencil, their test suites achieving 100% coverage, also achieved 100% mutation score. An instance of a program that does not follow this trend is mri-gridding that has 100% branch, barrier, and loop (*>*1 iterations) coverage but only 82% mutation score. On analysing the survived mutants, we found a significant fraction (160 out of 232) were arithmetic operator mutations within a function named *kernel value* that contained variables defining a fourteenth-order polynomial and a cubic polynomial. Effect of mutations on the polynomials did not propagate to the output of the benchmark with the given test suite. The histo program with low branch coverage, 100% barrier and loop coverage has 65.9% mutation score. Nearly two thirds of the branches in histo cannot be reached by the input data, as a result, all the mutations in the untouched branches is not killed, resulting in a low mutation score. A few of the programs in PolyBench have mutation scores that are between 60–70%. In these programs, most surviving mutations are arithmetic operator mutations.

As seen in the last row of Fig. 1 showing surviving mutations by operator type, arithmetic operators are the dominant surviving mutations in all three benchmark suites. Control flow adequate tests can kill arithmetic operator mutations only if they propagate to a control condition or the output. Data flow coverage may be better suited for estimating these mutations. Around 20% of relational operator mutations also survive in our evaluation. Most of the surviving relational operator mutations made slight changes to operators, such as *<* to *<*=, or *>* to *>*= and vice versa. The test suites provided with the benchmarks missed such boundary mutations.

**Undecided mutants** occur during executions of 9, out of the 46 subject programs and test suites across all three benchmark suites. Number of undecided mutants during the 9 executions is generally small (*<*= 5). The only exception is tpacf in the Parboil benchmark suite, that resulted in 18 undecided mutants when executing one of its test suite. Undecided mutants point to nondeterministic behaviour in the kernel, that is dependent on GPU thread execution model. A large number of undecided mutants is alarming and developers should examine kernel code more closely to ensure that the behaviour observed is as intended.

**Barriers** were not used in all benchmarks. Only 5 out of the 9 benchmarks in Parboil, and 4 of the 6 in Scan/Rodinia had barriers. PolyBench programs did not use any barriers. Mutations removed barrier function calls in these benchmarks and we ecorded the number of mutants killed by test suites. Percentage of killed barrier mutations is generally low across all benchmarks with barriers. For instance, removing 2 out of 3 barriers in the histo program in Parboil, and removing all barriers in the cutcp program had no effect on outputs of the respective program executions. This may either mean that the test suites are inadequate with respect to the barrier mutations or it could be an indication that these barriers are superfluous with respect to program outputs, and the need for synchronisation should be further justified. For the programs in our experiment, we found barriers, whose mutations survived, to be unnecessary.

**Coverage versus Mutation Score.** The plots in Fig. 1 illustrate total mutation score over all types of mutations for each subject program and test suite. We also compute mutation scores specifically for branches, barriers, and loops using mutations relevant to them. We do this to compare against branch, barrier and loop coverage achieved for each of the subject programs. We found that mutation score for branches closely follows branch coverage for most subject programs. Outliers include adi, nn, convolution-2d and convolution-3d. Mutations that change *<* to *<*= are not killed in these kernels; these comprise one third of all branch mutations.

Mutation score for barriers is quite different from barrier coverage. This is because test suites are able to execute the barriers and achieve coverage. However, they are unable to produce different outputs when the barriers are removed. This may be a problem with the superfluous manner in which barriers are used in these programs.

Loop coverage with *>*1 iterations is 100% for all but one subject program (srad in Rodinia). Mutation score for loops on the other hand is variable. In general, tests achieving loop coverage are unable to reveal loop boundary mutations. Histo and srad are worth noting with high loop coverage but low loop mutation scores. We find that mutations to the loop boundary value in these two benchmarks survive, which implies that access to loop indices outside the boundary go unchecked in these programs. These unsafe values of loop indices should be disallowed in these kernels and loop boundary mutations in our fault seeder help reveal them.

#### **6.3 Schedule Amplification: Deadlocks and Data Races**

**Kernel Deadlocks:** When we used the CLTestCheck schedule amplifier on our benchmarks, we found kernel executions deadlock when the work-group ID selected to go first exceeds the number of available compute units. As there are no guarantees on how work-groups are mapped to compute units, we allow workgroup IDs exceeding number of compute units to go first in some test executions using our schedule amplifier. However, it appears that the GPU makes unstated assumptions on what work-group IDs are allowed to go first. As noted by Sorenson et al. [22], "execution of large number of work-groups is in any *occupancy* *bound* fashion, by delaying the scheduling of some work-groups until others have executed to completion". They observed deadlocks in kernel execution due to inter work-group barriers. However, in the benchmarks in our evaluation, there is no explicit inter work-group barrier. It may be the case that developers made implicit assumptions on inter work-group barriers using the occupancy bound model and our schedule amplification approach violates this assumption. Nevertheless, our finding exposes the need for an inter work-group execution model that explicitly states the details and assumptions related to mapping of workgroups to compute units for a given kernel on a given GPU platform.

**Inter Work-group Data Races:** We were able to reveal a data race in the spmv application from the Parboil benchmark suite. We found that when workgroups 0 or 1 are chosen to go first in our schedules, the kernels execution always produces the same result. However, when we pick other work-group ids to go first, the test output is not consistent. Among twenty executions for each schedule, the frequency of producing correct output varies from 45% to 70%.

We observe similar behaviour in the tpacf application in Parboil when we delete the last barrier function call in the kernel. The kernel execution produces consistent outputs when we pick work-group 0 or 1 to go first. When we pick other work-groups to go first using our schedule amplifier, the kernel execution results are non-deterministic.

We observe no unusual behaviour in any of the PolyBench programs. These programs split the computation into multiple kernels and the CPU program launches GPU kernels one by one. The transfer of control from the GPU to the CPU between kernels acts like a barrier as the CPU will wait until a kernel finishes before launching the next kernel. In addition, care has been taken in the kernel code to ensure threads do not access the same memory location. As a result, we observe no data races in PolyBench with our schedule amplifier.

## **7 Conclusion**

We have presented the CLTestCheck framework for measuring test effectiveness over OpenCL kernels with capabilities to measure code coverage, fault seeding and mutation score measurement, and finally amplify the execution of a test input with multiple work-group schedules to check inter work-group interactions. Our empirical evaluation of CLTestCheck capabilities with 82 publicly available kernels revealed the following,


In sum, the CLTestCheck framework is an automated, effective and useful tool that will help developers assess how well OpenCL kernels have been tested, kernel regions that require further testing, uncover bugs with respect to workgroup schedules. In the future, we plan to add further metrics, like data flow coverage with work-group schedule, to strengthen test adequacy measurement.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Implementing SOS with Active Objects: A Case Study of a Multicore Memory System**

Nikolaos Bezirgiannis<sup>1</sup>, Frank de Boer<sup>1</sup>, Einar Broch Johnsen2(B) , Ka I Pun2,3, and S. Lizeth Tapia Tarifa<sup>2</sup>

<sup>1</sup> CWI, Amsterdam, The Netherlands *{*n.bezirgiannis,f.s.de.boer*}*@cwi.nl <sup>2</sup> Department of Informatics, University of Oslo, Oslo, Norway *{*einarj,violet,sltarifa*}*@ifi.uio.no <sup>3</sup> Western Norway University of Applied Sciences, Bergen, Norway

**Abstract.** This paper describes the development of a parallel simulator of a multicore memory system from a model formalized as a structural operational semantics (SOS). Our implementation uses the Abstract Behavioral Specification (ABS) language, an executable, active object modelling language with a formal semantics, targeting distributed systems. We develop general design patterns in ABS for implementing SOS, and describe their application to the SOS model of multicore memory systems. We show how these patterns allow a formal correctness proof that the implementation simulates the formal operational model and discuss further parallelization and fairness of the simulator.

## **1 Introduction**

Structural operational semantics (SOS) [1], introduced by Plotkin in 1981, describes system behavior as transition relations in a syntax-oriented, compositional way, using inference rules for local transitions and their composition. Process synchronization in SOS rules is expressed abstractly using, e.g., assertions over system states and reachability conditions over transition relations as premises, and label synchronization for parallel transitions. This high level of abstraction greatly simplifies the verification of system properties, but not the simulation of system behavior as execution quickly becomes a reachability problem with a lot of backtracking. In this paper, we study how to implement a parallel simulator with a formal correctness proof from a SOS model, in terms of a case study of a multicore memory system. Such a correctness proof requires that the implementation language is also defined formally by an operational semantics.

Supported by *SIRIUS: Centre for Scalable Data Access* (www.sirius-labs.no) and *ADAPt: Exploiting Abstract Data-Access Patterns for Better Data Locality in Parallel Processing* (www.mn.uio.no/ifi/english/research/projects/adapt/).

c The Author(s) 2019

R. H¨ahnle and W. van der Aalst (Eds.): FASE 2019, LNCS 11424, pp. 332–350, 2019. https://doi.org/10.1007/978-3-030-16722-6\_20

A major challenge in software engineering is the exploitation of the computational power of multicore (and manycore) architectures. One important aspect of this challenge is the memory systems of these architectures. These memory systems generally use caches to avoid bottlenecks in data access from main memory, but caches introduce data duplication and require protocols to ensure coherence. Although data duplication is usually not visible to the programmer, the way a program interacts with these copies largely affects performance by moving data around to maintain coherence. To develop, test and optimize software for multicore architectures, we need correct, executable models of the underlying memory systems. A SOS model of multicore memory systems with correctness proofs for cache coherency has been described in [2], together with a prototype implementation in the rewriting logic system Maude [3]. However, this fairly direct implementation of the SOS model is not well suited to simulate large systems.

This paper considers an implementation of the SOS model in ABS [4], a language tailored to the description of distributed systems based on active objects [5]. ABS is formally defined by an operational semantics and supports parallel execution on backends in Erlang, Haskell, and Java. The following features of ABS allow a high-level, coarse-grained view of the execution of different method invocations by different active objects: encapsulation of local state in active objects, communication using asynchronous method calls and futures, and cooperative scheduling of the method invocations of an active object. Our case study fully exploits these features and the resulting abstractions to correctly implement the complex process synchronization of the original SOS model.

The main contributions of this paper are as follows:


## **2 An Abstract Model of a Multicore Memory System**

Design decisions for a program running on top of a multicore memory systems can be explored using simulators based on abstract models. Bijo et al. [2,6] developed a model which takes as input tasks (expressed as data access) to be executed, the corresponding data layout in main memory (indicating where data is allocated), and a parallel architecture consisting of cores with private multi-level caches and shared memory (see Fig. 1). Additionally, the model is configurable in the number of cores, the number and size of caches, and the associativity and replacement policy. Memory is organized in blocks which move between caches and main memory. For simplicity, the model assumes that the size of cache lines and memory blocks in main memory coincide, abstracts from the data content of memory blocks, and transfers memory blocks from the caches of one core to the caches of another core via main memory.

Tasks from the program are scheduled for execution from a shared task pool. Task execution on a core requires memory blocks to be transferred from main memory to the closest cache. Each cache has a pool of fetch/flush instructions to move blocks among caches and between caches and main memory. Consistency between multiple copies of a memory block is ensured using the stan-

**Fig. 1.** Abstract model of a multicore memory system.

dard cache coherence protocol MSI (e.g., [7]), with which a cache line is either modified, shared or invalid. A *modified* cache line has the most recent value of the memory block, therefore all other copies are *invalid* (including the one in main memory). A *shared* cache line indicates that all copies of the block are consistent. The protocol's messages are broadcast to the cores. The details of the broadcast (e.g., on a mesh or a ring) can be abstracted into an *abstract communication medium*. Following standard nomenclature, *Rd* messages request *read* access and *RdX* messages *read exclusive* access to a memory block. The latter invalidates other copies of the same block in other caches to provide write access.

To access data from a block n, a core looks for n in its local caches. If n is not found in shared or modified state, a *read request* !*Rd*(*n*) is broadcast to the other cores and to main memory. The cache can *fetch* the block when it is available in main memory. Eviction is required if the cache is full. Writing to block n requires n to be in shared or modified state in the local cache; if it is in shared state, an *invalidation request* !*RdX* (*n*) is broadcast to obtain exclusive access. If a cache with block n in modified state receives a read request ?*Rd*(*n*), it *flushes* the block to main memory; if a cache with block n in shared state receives an invalidation request ?*RdX* (*n*), the cache line will be *invalidated*; the requests are discarded otherwise. Read and invalidation requests are broadcast instantaneously in the abstract model, reflecting that signalling on the communication medium is order of magnitude faster than moving data to or from main memory.


**Fig. 2.** Syntax of runtime configurations, where over-bar denotes sets (e.g., *CR*).

#### **2.1 Formalization of the Multicore Memory System as an SOS Model**

An operational meaning for the abstract model described above has be defined using structural operational semantics (SOS) [1] with labeled transitions to model broadcast in the abstract communication medium. The resulting formalization [2,6] is shown to guarantee standard correctness properties for data consistency and cache coherence from the literature [8,9], including the preservation of program order in each core, the absence of data races, and no access to stale data. We briefly outline the main aspects of the formal model. The runtime syntax is given in Fig. 2. A configuration *cf* consists of main memory M, cores CR, caches *Ca*, and tasks *dap* to be scheduled. (We syntactically abuse set operations for multisets, including union <sup>∪</sup> and subtraction \.) A core cid • *rst* with identifier cid executes runtime statements *rst*. A cache with identifier caid has a local cache memory M and data instructions *dst*. We assume that caid encodes the cid of the core to which the cache belongs and its level in the cache hierarchy. We denote by *Status* ∪ {⊥} the extension of the set of status tags with the undefined value <sup>⊥</sup>. Thus, a memory <sup>M</sup> : *Address* <sup>→</sup> *Status* ∪ {⊥} maps addresses <sup>n</sup> to either a status tags *Status* or to <sup>⊥</sup> if the memory block with address <sup>n</sup> is not found in M.

*Data access patterns dap* model tasks consisting of **read**(r) and **write**(r) operations to references r and control flow operations for sequential composition *dap*1; *dap*2, non-deterministic choice *dap*<sup>1</sup> *dap*2, repetition *dap*∗, task creation **spawn**(*dap*), and **commit** which flushes the entire cache after task execution. The empty access pattern is denoted ε. Cores execute *runtime statements rst*, which extend *dap* with **readBl**(r) and **writeBl**(r) to block execution while waiting for data. Caches execute *data instructions dst* to fetch and flush the memory block with address n, here **fetchBl**(n) blocks execution while waiting for data, and **flush** flushes the entire cache.

The *abstract communication medium* allows messages from one cache to be transmitted to the other caches and to main memory in a parallel instantaneous broadcast. Communication in the abstract communication medium is formalized in terms of label matching on transitions. The formal syntax for this label mechanism is as follows:

$$S ::= !Rd(n) \; | \; !RdX(n) \; \qquad \qquad R ::= ?Rd(n) \; | \; ?RdX(n) \;$$

Here, for any address n, a request of the form !*Rd*(*n*) or !*RdX* (*n*) is sent by one node and its dual of the form *dual*(!*Rd*(*n*)) =?*Rd*(*n*) or *dual*(!*RdX* (*n*)) = ?*RdX* (*n*) is broadcast to the rest of nodes and main memory. The syntax of the model is further detailed in [2,6].

#### **2.2 Local and Global SOS Rules**

The semantics is divided into local and global rules. Local rules capture interaction inside a node containing a core and the hierarchy of caches. Global rules capture synchronization and coordination between different nodes and main memory. In an *initial* configuration *cf<sup>0</sup>* , all blocks in main memory M have status *sh*, all cores are idle, all caches are empty, and the task pool in *dap* has a single task representing the main block of a program. Let *cf* <sup>∗</sup> −→ *cf* denote an execution starting from *cf* and reaching *cf* by applying global transition rules, which in turn apply local transition rules for each core and its cache hierarchy. In the rules, let the auxiliary function *addr* (r) return the address n of the block containing reference r, *cid*(caid) the identity of the core associated with cache caid, *lid*(caid) the cache level of caid, and *status*(M,n) the status of block n in map M. Let the predicate *first*(caid) hold when caid is the first level and *last*(caid) when caid is the last level cache. Note that unlabelled transitions <sup>→</sup> can be executed asynchronously, while labelled transitions *<sup>S</sup>* −→ require synchronization between all the nodes and main memory (see Figs. 3 and 4). We discuss some representative rules for local and global level of the SOS model. The full SOS formalization can be found in [6].

**Local semantics.** The first rules of Fig. 3 involve a core and its first level cache. In PrRd1, reading reference r succeeds if the block containing r is available. Otherwise, in PrRd<sup>2</sup> a **fetch**(n) instruction is added to the data instructions *dst* of the first level cache and further execution of the core is blocked by **readBl**(r). Writing to r only succeeds if the associated memory block has *mo* status in the first level cache. If the cache line is shared, the core broadcasts a !*RdX* (*n*) request to acquire exclusive access, where the broadcast appears as a label on the transition in PrWr2. Otherwise, the block must be fetched from main memory in PrWr<sup>3</sup> and **writeBl**(r) blocks execution.

For the remaining rules of Fig. 3, LC-Hit<sup>1</sup> and LC-Miss<sup>1</sup> capture interactions between adjacent levels of caches, and LCC-Miss<sup>1</sup> local state change in a cache line. If cache caid*<sup>i</sup>* needs a block n that is *sh* or *mo* in the next level cache, the address where block n should be placed is decided by a function *select*(M*i*, n) which reflects the cache associativity and the replacement policy. If eviction is needed, block n in caid*<sup>j</sup>* will be swapped with the selected block in caid*<sup>i</sup>* in LC-Hit1. LC-Miss<sup>1</sup> shows how **fetch**(n)-instructions propagate to lower cache levels: **fetch**(n) is replaced by **fetchBl**(n) in caid*<sup>i</sup>* and added to the data instructions in caid*<sup>j</sup>* . If the block cannot be found in any local cache, we have a *cache miss*: Execution is blocked by **fetchBl**(n) and a read request !*Rd*(*n*) is broadcast, represented by the label in LLC-Miss1.

**Fig. 3.** Local transition rules.

**Fig. 4.** Global transition rules.

**Global semantics.** The global rules synchronize the cache hierarchies of different cores and main memory, and ensures coherence. Selected global rules are given in Fig. 4. Rule Synch<sup>1</sup> captures a global step with synchronization on a label S, which can be either !*Rd*(*n*) or !*RdX* (*n*). The request will be broadcast to other caches. To maintain data consistency, these caches must process the requests at the same time. The receiving label R is the *dual* of S. For synchronization, the transition is decomposed into a premise for main memory with label R and another premise for the caches with label S. Rule Synch<sup>2</sup> distributes the receiving label to caches *Ca*2, which do not belong to the cache hierarchy of the sender core CR1. The predicate *belongs*(*Ca*, CR) expresses that any cache in *Ca* belongs to exactly one core in CR. Rule Asynch captures parallel transitions without label. These transitions can be local to individual nodes and caches, parallel memory accesses, or the parallel spawning and scheduling of new tasks.

## **3 The ABS Model of the Multicore Memory System**

In this section we outline the translation of the formal model into an executable object-oriented model using the ABS modeling language. We first briefly introduce the language and later explain the structural and behavioural correspondence between these two models, with a focus on the main challenges.

## **3.1 The ABS Language**

ABS is a modeling language for designing, verifying, and executing concurrent software [4]. The language combines the syntax and object-oriented style of Java with the Actor model of concurrency [10] into active objects which decouple communication and synchronization using asynchronous method calls, futures and cooperative scheduling [5]. Although only one thread of control can execute in an active object at any time, cooperative scheduling allows different threads to interleave at explicitly declared points in the code. Access to an object's fields is encapsulated, so any non-local (outside of the object) read or write to fields must happen explicitly via asynchronous method calls so as to mitigate race-conditions or the need for mutual exclusion (locks).

We explain the basic mechanism of asynchronous method calls and cooperative scheduling in ABS by the simple code example of a class Bus. First, the execution of a statement res = **await** <sup>o</sup>!m(args) con-

**Fig. 5.** Bus lock implementation in ABS using await on Booleans.

sists of storing a message m(args) corresponding to the asynchronous call to the message pool of the callee object <sup>o</sup>. This **await** statement *releases the control* of the caller until the return value of that method has been received. Releasing the control means that the caller can execute other messages from its own message pool in the meantime. ABS supports the shorthand o.m(args) to make an asynchronous call f=o!m(args) followed by the operation f.**get** which *blocks* the caller object (does not release control) until the future f has received the return value from the call. As a special case the statement **this**.m(args) models a self-call, which corresponds to a standard subroutine call and avoids this blocking mechanism. The code in Fig. <sup>5</sup> illustrates the use of the **await** statement

**Fig. 6.** Class diagram of the ABS model.

on a Boolean condition to model a binary semaphore, which is used to enforce exclusive access to a communication medium implemented as a "bus". Thus, the statement **await** bus!lock bus() will suspend the calling method invocation (and release control in the caller object) and will be resumed when the generated invocation of the method lock bus of the "bus" itself has been resumed when the local condition unlocked (of the "bus") has become true.

#### **3.2 The Structural View**

The runtime syntax of the SOS is represented by ABS classes, as outlined in Fig. 6. We briefly overview the translation. In ABS, object identifiers guarantee unique names and object references are used to capture how cores and caches are related. These references are encoded in a one-to-one correspondence with the naming scheme of the SOS.

A core cid • *rst* is translated into a class Core with a field currentTask representing the current task *rst*. Each core holds a reference to the first level cache. A cache memory caid • <sup>M</sup> • *dst* is translated into a class Cache with an interface ICache and a class parameter nextLevel. In a cache, nextLevel holds a reference to the next level cache. If this reference is Nothing, it is last level cache (in the SOS, a predicate *last* is used to identify the last level). The field cacheMemory models the cache's memory M in SOS. The process pool of each cache object in ABS represents the data instruction set *dst*.

An ABS configuration consists of a number of cores with their corresponding cache hierarchies, the main memory, a scheduler with tasks waiting to be scheduled, and the ABS classes Bus and Barrier, which model the abstract communication medium and the global synchronization with labels !*Rd*(*n*) and !*RdX* (*n*)

**Fig. 7.** Object diagram of an initial configuration.

in the SOS. The object diagram in Fig. 7 shows an initial configuration corresponding to the one depicted in Fig. 1.

#### **3.3 The Behavioral View**

We discuss in this section the design patterns in ABS that implement the synchronization inherent in the SOS model. We observe here that the combination of asynchronous method calls and cooperative scheduling is crucial because of the *multitasking* inherent in the SOS model, which requires that objects need to be able to process other requests; e.g., caches need to flush memory blocks while waiting for a fetch to succeed.

*Local synchronization* in the SOS model between two structural entities (e.g., two caches in rule LC-Hit<sup>1</sup> of Fig. 3), is implemented by the following synchronization pattern in ABS (see Fig. 8). Given two objects o<sup>1</sup> and o2, let o<sup>1</sup> execute method m1, which checks the local conditions of o<sup>1</sup> (highlighted as region **A** in Fig. 8). If these local conditions hold, method m<sup>2</sup> on o<sup>2</sup> is called asynchronously. Method m<sup>2</sup> completes when the local conditions of o<sup>2</sup> hold (highlighted as region **B** in Fig. 8). However, when m<sup>2</sup> has returned and object o<sup>1</sup> again schedules method m1, the conditions on object o<sup>2</sup> need no longer hold. Therefore, o<sup>1</sup> next calls the method m<sup>3</sup> *synchronously* to check these conditions

**Fig. 8.** Local synchronization between two ABS objects.

again. If these condition still hold, method m<sup>3</sup> returns successfully (in general, having updated o2), and we can proceed to do the local changes in o<sup>1</sup> (highlighted

**Fig. 9.** Extract of ABS method fetch. When this code is reached, the requested cache line n has status invalid or it is not in the cache. The function select chooses a cache line to be swapped with n. If there is still free space in the cache, select returns Nothing. If n has either shared or modified status in the next level cache, the method swap removes the cache line with address n, inserts the selected cache line and returns the current status of n; otherwise, swap simply returns Nothing.

as region **C** in Fig. 8). Otherwise, the process needs to be repeated until we succeed. Note that method m<sup>3</sup> should not contain release points; because this method is called synchronously from a different object, a release point will in general have the potential of introducing deadlocks in the caller object.

To illustrate the above protocol, consider the code snippet in Fig. 9, which corresponds to part of several rules in the SOS (in particular, rule LC-Hit1). Here, the current object **this** corresponds to caid*<sup>i</sup>* in the SOS, running method fetch, and the referenced object in nextCache corresponds to caid*<sup>j</sup>* . When fetch from nextCache returns, all the required conditions in nextCache are *True*. However, since the call is asynchronous, (some of) the conditions may no longer hold when execution continues in **this**. This is addressed by checking the return value of method swap: If swap returns an address, it means the conditions still hold and the necessary updates are performed both locally and in nextCache; otherwise (when swap returns Nothing) fetch will be called again.

*Global synchronization* in the SOS (see Fig. 10a) is modelled by matching labelled transitions. To simulate this instantaneous communication in ABS, we introduced the classes Bus and Barrier. The synchronization protocol is activated by asynchronous calls to the respective methods sendRd and sendRdX of the bus. The bus subsequently asynchronously calls the corresponding methods receiveRd and receiveRdX of the caches. Two barriers start and end are used by the caches to synchronize the start, as well as the completion, of the local executions of methods receiveRd and receiveRdX.

However, observe that objects in ABS are input enabled: it is always possible to call a method on an object. In our model, this scheme may give rise

(a) State machine of the global synchronization using labels in the SOS model.

(b) State machine of the global synchronization using a bus and barriers in the ABS model.

**Fig. 10.** Synchronization in SOS vs ABS. In the SOS model (a), circles represent nodes in the memory system and shaded arrows labelled transitions. Note that the bus is *implicit* in the SOS model, as synchronization is captured by label matching. In the ABS model (b), circles represent the same nodes as in the SOS model, shaded arrows method invocations, solid arrows mutual access to the bus object and dotted arrows barrier synchronizations.

to inconsistent states: the local status of a memory location which triggers an asynchronous call of one of the methods sendRd and sendRdX of the bus may be invalidated by other bus synchronizations. Therefore, we add a lock to the bus (see Figs. 5 and 6), which is used to ensure exclusive access to the *message pool* of the bus when one of the methods read, write, and fetch are executed. The lock is released in case bus synchronization is not needed. The overall scheme is depicted in Fig. 10b. The exclusive access to the message pool of the bus guarantees that the message pool of the bus contains at most one call to one of the methods sendRd and sendRdX. Consequently, the triggering condition of the call cannot be invalidated before the call has been executed. This *strict* locking strategy, however, decreases concurrency in the distributed system, but reduces the complexity of the proof of equivalence between the SOS and the distributed implementation. We discuss how to further enhance the parallelization in Sect. 5.

## **4 Correctness**

In this section we discuss the correctness of the ABS model by means of a simulation relation between the transition system describing the semantics of the ABS model of the multicore memory system and the transition system described by the SOS model.

The semantics of an ABS model can be described by a transition relation between global configurations. A global configuration is a (finite) set of object configurations. An object configuration is a tuple of the form *oid*, σ, p, Q, where *oid* denotes the unique identity of the object, σ assigns values to the instance variables (fields) of the object, p denotes the currently executing process, and Q denotes a set of (suspended) processes. A process is a closure (τ,S) consisting of an assignment τ of values to the local variables of the statement S.

We refer to [4] for the details of the structural operational semantics for deriving transitions <sup>G</sup> <sup>→</sup> <sup>G</sup> between global configurations in ABS. Since in ABS concurrent objects only interact via asynchronous method calls and processes are scheduled non-deterministically (which provides an abstraction from the order in which the processes are generated by method calls), the ABS semantics satisfies the following global confluence property that allows to commute consecutive computations steps of *independent* processes which belong to *different* objects. Two processes are independent if neither one is generated by the other by an asynchronous call.

**Lemma 1 (Global confluence).** *For any two transitions* <sup>G</sup> <sup>→</sup> <sup>G</sup><sup>1</sup> *and* <sup>G</sup> <sup>→</sup> G<sup>2</sup> *that describe execution steps of independent processes of different objects, there exists a global configuration* <sup>G</sup> *such that* <sup>G</sup><sup>1</sup> <sup>→</sup> <sup>G</sup> *and* <sup>G</sup><sup>2</sup> <sup>→</sup> <sup>G</sup> *.*

An object configuration is *stable* if the statement S to be executed has terminated or starts either with a **get** operation on a future or with an **await** statement on a Boolean condition or a future. A global ABS configuration is *stable* if all its object configurations are stable. Observe that our ABS model does not give rise to local divergent computations without passing through stable configurations; i.e., every local computation eventually enters a stable configuration. Together with the global confluence property in Lemma 1, this allows to restrict the semantics of the ABS model in the simulation relation to stable global configurations; i.e., transitions <sup>G</sup> <sup>⇒</sup> <sup>G</sup> between stable global configurations <sup>G</sup> and <sup>G</sup> which result from a (non-empty) sequence of local execution steps of a *single* process from one stable configuration to a next one.

Because of the global synchronization with the bus in ABS described above, we may also represent without loss of generality the synchronization on the bus by a *single* global transition <sup>G</sup> <sup>⇒</sup> <sup>G</sup> which involves a completed execution of the method sendRd(...) (or sendRdX(...)) by the bus. This is justified because the global confluence allows for a scheduling policy such that the execution of the processes that are generated by these methods, i.e., the calls of the methods receiveRd(...) (or receiveRd(...)) are not interleaved with any other processes.

*The simulation relation.* The structural correspondence between a global configuration of the ABS model and a configuration of the SOS model is described in Sect. 3.2. For each method we have constructed a table which, among others, associates with some, so-called *observable*, occurrences of **await** statements (appearing in the method body) a corresponding **dst** instruction. In general, the execution of the remaining (occurrences of) **await** statements, for which there does not exist a corresponding **dst** instruction, involves some asynchronous messaging *preparing* for the corresponding synchronous exchange of information in the SOS model. In some cases, the execution of these unobservable statements (e.g., the read and write methods) also does not correspond to a change of the SOS configuration. Let α map every stable global configuration G of the ABS model to a structurally equivalent configuration α(G) of the SOS model, which additionally maps every observable process (either queued or active) to the associated **dst** instruction (a process is observable if its corresponding statement is observable).

We arrive at the following theorem which expresses that the ABS model is a correct implementation of the abstract model.

**Theorem 1.** *Let* <sup>G</sup> *be a stable global configuration of the ABS model. If* <sup>G</sup> <sup>⇒</sup> <sup>G</sup> *then* <sup>α</sup>(G) <sup>→</sup><sup>∗</sup> <sup>α</sup>(G )*, where* →<sup>∗</sup> *denotes the reflexive, transitive closure of* →*.*

*Proof.* The proof proceeds by a case analysis of the given transition <sup>G</sup> <sup>⇒</sup> <sup>G</sup> , which, as discussed above, involves the local execution of some basic sequential code by a single object. For example, for the case of a completed execution of a method sendRd(...) (or sendRdX(...) ) by the bus, a simple inspection of the sequential code of the methods that have been executed, e.g., sendRd(...) and receiveRd(...), suffices to establish the existence of a corresponding transition <sup>α</sup>(G) <sup>→</sup> <sup>α</sup>(G ).

The remaining cases are captured by tables (as mentioned above) which provide for each method the following information. The statements in the **Location** column of each table represent for the respective method all possible processes generated by a call, i.e., a call to the method itself, and the processes which correspond to the **await** statements appearing in its body. In each row the **Next release point** statement indicates the next **await** statement or **return** statement that can be reached (statically). The **dst** instruction in each row specifies the instruction which corresponds to the **Location** statement in the simulation. Finally, **Enable condition** in each row specifies the enabling conditions (expressed in the abstract model) of the rule applications (of the abstract model) specified in **Rules**. In general these rule applications involve the sequential application of one or more rules. For unobservable statements, for which there is no corresponding **dst** instruction, the latter two columns are left unspecified.

The case analysis then consists of checking statically for each row the *local* structural correspondence between the resulting ABS process (the **Next release point**) and the resulting SOS configuration described by the specified rule applications.

## **5 Parallelism and Fairness of the ABS Model**

This section discusses how to relax the eager locking policy of the bus implementation, without generating inconsistent states. Instead of locking the bus unconditionally when executing the read, write, and fetch methods in the ABS model, and releasing the lock when no bus synchronization is required, we only lock the bus when the triggering conditions of the bus synchronization may be invalidated. For example, an *optimistic* write implementation (see Fig. 11) tries to acquire the lock of the bus, and only after the acquisition checks if a racecondition has happened and invalidated the shared status of the address n; in this case, the write method will *backtrack* and retry (by calling itself); otherwise the write operation can safely be performed.

**Fig. 11.** Alternative, optimistic implementation of the write method to detect a bus race-condition and, in that case, retry the operation.

The strict and relaxed variations of the global synchronization bear strong resemblance respectively to conservative [11,12] and optimistic [13] algorithms in parallel and distributed discrete-event simulation (PDES) [14]. As with PDES, there is no clear winner between the strict (conservative) and relaxed (optimistic) versions of our cache simulator; certain computer programs (input-models) will be simulated faster using one version or the other, depending on the interdependency of the parallel components (for us, the caches). For the contrived experiment, we implemented a penalty system in the ABS model. A cache penalty is the cost (delay) incurred by failing to read or write to a particular level of cache—set here to (L1, L2, L3) =*cost* (1, 10, 100) [15]. We compared the two versions for a scenario with full inter-dependency (simultaneous write instructions on the same memory block) and a scenario with minimal inter-dependency (write instructions on separate memory blocks) between 16 simulated cores. In these experiments the strict version was slightly faster up to 2% for the first case and losing out by up to 12% in the second case. The experiments were executed using the ABS-Erlang backend [16] and Erlang version 21, running on quad-socket 8-cores 16-hyperthreads Xeon <sup>R</sup> L7555, which yielded in total 64 hardware threads.

*Fairness.* A concern that often arises in parallel execution is fairness: the degree of variability when distributing the computing resources among different parallel components—here, the simulated cores. Fairness of parallel execution can affect the simulation's accuracy in approximating the intended (or idealized) manycore hardware. To ensure fairness of the simulation, we make use of *deployment components* [17] in ABS.

A *Deployment Component* (DC) is an ABS execution location that is created with a number of virtual resources (e.g., execution speed, memory use, network bandwidth), which are shared among its deployed objects. Any annotated statement [Cost: x] S decrements by x the resources of its DC and then completes, or


**Table 1.** Total cache penalties between strict/relaxed, with/without DC configurations.

it will stall its computation if there are currently not enough resources remaining; the statement S may continue on the next passage of the global symbolic time where all the resources of the DCs have been renewed, and will eventually complete when its Cost has reached zero.

We make use of this resource modeling of ABS to assign equal (fair) resources of virtual execution speed to the simulated cores of the system. Each Core object is deployed onto a separate DC with fixed Speed(1) resources. The processing of each instruction has the same cost [Cost: 1]—a generalization, since common processor architectures execute different instructions in different speeds (cycles per instruction); e.g., JUMP is faster than LOAD. The result is that all Cores can execute maximum one instruction in every time interval of the global symbolic clock, and thus no Core can get too far ahead with processing its own instructions—a problem that manifests upon the parallel simulation of N number of cores using a physical machine of M cores, where N is vastly greater than M. To test this, we performed a write-congested experiment with a configuration of 20 simulated cores and 3 cache levels, comparing the strict and relaxed variations, with and without the use of deployment components. The results (shown in Table 1) were measured on a quad-core system running ABS-Erlang, counting the total cache penalties of all the cores. With respect to the strict variation, the results with and without DC have similar penalties; this can be attributed to the lock-step nature of strict bus synchronization, where no cache (and thus core) can unfairly stride forward. In the relaxed variation, however, where synchronization is less strict, we see that without the fairness imposed by DC, the penalties are almost halved, which means some cores are allowed to do multiple (successful) write operations while other cores are still waiting on the "backlog" to be simulated. This gives rise to less penalties, because of less runtime interleavings of the simulated cores and thus less competition between them.

## **6 Related Work**

There is in general a significant gap between a formal model and its implementation [18]. SOS [1] succinctly formalizes operational models and are well-suited for proofs, but direct implementations of SOS quickly lead to very inefficient implementations. Executable semantic frameworks such as Redex [19], rewriting logic [20,21], and K [22] reduce this gap, and have been used to develop executable formal models of complex languages like C [23] and Java [24]. The relationship between SOS and rewriting logic semantics has been studied [25] without proposing a general solution for label matching. Bijo et al. implemented their SOS multicore memory model [26] in the rewriting logic system Maude [3] using an orchestrator for label matching, but do not provide a correctness proof wrt. the SOS. Different semantic styles can be modeled and related inside one framework; for example, the correctness of distributed implementations of KLAIM systems in terms of simulation relations have been studied in rewriting logic [27]. Compared to these works on semantics, we implemented an SOS model in a distributed active object setting, and proved the correctness of this implementation.

Correctness-preserving compilation is related to correctness proofs for implementations, and ensures that the low-level representation of a program preserves the properties of the high-level model. Examples of this line of work include typepreserving translations into typed assembly languages [28] and formally verified compilers [29,30], which proves the semantic preservation of a compiler from C to assembler code, but leaves shared-variable concurrency for future work. In contrast to this work which studies compilation from one language to another, our work focuses on a specific model and its implementation and specifically targets parallel systems.

Simulation tools for cache coherence protocols can evaluate performance and efficiency on different architectures (e.g., gems [31] and gem5 [32]). These tools perform evaluations of, e.g., the cache hit/miss ratio and response time, by running benchmark programs written as low-level read and write instructions to memory. Advanced simulators such as Graphite [33] and Sniper [34] run programs on distributed clusters to simulate executions on multicore architectures with thousands of cores. Unlike our work, these simulators are not based on a formal semantics and correctness proofs. Our work complements these simulators by supporting the executable exploration of design choices from a programmer perspective rather from hardware design. Compared to worst-case response time analysis for concurrent programs on multicore architectures [35], our focus is on the underlying data movement rather than the response time.

## **7 Conclusion**

We have introduced in this paper a methodology for implementing SOS models in the active object language ABS, and applied this methodology to the implementation of a SOS model of an abstraction of multicore memory systems, resulting in a parallel simulator for these systems. A challenge for this implementation is to correctly implement the synchronization patterns of the SOS rules, which may cross encapsulation barriers in the active objects, and in particular label synchronization on parallel transitions steps. We prove the correctness of this particular implementation, exploiting that the ABS model allows for a highlevel coarse-grained semantics. We investigated the further parallelization and fairness of the ABS model.

The results obtained in this paper provide a promising basis for further development of the ABS model for simulating the execution of (object-oriented) programs on multicore architectures. A first such development concerns an extension of the abstract memory model with data. In particular, having the addresses of the memory locations themselves as data allows to model and simulate different data layouts of the dynamically generated object structures.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Optimal and Automated Deployment for Microservices**

Mario Bravetti<sup>1</sup>, Saverio Giallorenzo2(B), Jacopo Mauro<sup>2</sup>, Iacopo Talevi<sup>1</sup>, and Gianluigi Zavattaro<sup>1</sup>

<sup>1</sup> FOCUS Research Team, University of Bologna/Inria, Bologna, Italy <sup>2</sup> University of Southern Denmark, Odense, Denmark saverio@imada.sdu.dk

**Abstract.** Microservices are highly modular and scalable Service Oriented Architectures. They underpin automated deployment practices like Continuous Deployment and Autoscaling. In this paper we formalize these practices and show that automated deployment — proven undecidable in the general case — is algorithmically treatable for microservices. Our key assumption is that the configuration life-cycle of a microservice is split into two phases: (i) creation, which entails establishing initial connections with already available microservices, and (ii) subsequent binding/unbinding with other microservices. To illustrate the applicability of our approach, we implement an automatic optimal deployment tool and compute deployment plans for a realistic microservice architecture, modeled in the Abstract Behavioral Specification (ABS) language.

## **1 Introduction**

Inspired by service-oriented computing, Microservices structure software applications as highly modular and scalable compositions of fine-grained and looselycoupled services [18]. These features support modern software engineering practices, like continuous delivery/deployment [30] and application autoscaling [3]. Currently, these practices focus on single microservices and do not take advantage of the information on the interdependencies within an architecture. On the contrary, architecture-level deployment supports the global optimization of resource usage and avoids "domino" effects due to unstructured scaling actions that may cause cascading slowdowns or outages [27,35,39].

In this paper, we formalize the problem of automatic deployment and reconfiguration (at the architectural level) of microservice systems, proving formal properties and presenting an implemented solution.

In our work, we follow the approach taken by the *Aeolus component model* [13–15], which was used to formally define the problem of deploying component-based software systems and to prove that, in the general case, such problem is undecidable [15]. The basic idea of Aeolus is to enrich the specification of components with a finite state automaton that describes their deployment life cycle. Previous work identified decidable fragments of the Aeolus model: e.g., removing from Aeolus replication constraints (e.g., used to specify a minimal amount of services connected to a load balancer) makes the deployment problem decidable, but non-primitive recursive [14]; removing also conflicts (e.g., used to express the impossibility to deploy in the same system two types of components) makes the problem PSpace-complete [34] or even poly-time [15], but under the assumption that every required component can be (re)deployed from scratch.

Our intuition is that the Aeolus model can be adapted to formally reason on the deployment of microservices. To achieve our goal, we significantly revisit the formalization of the deployment problem, replacing Aeolus components with a model of *microservices*. The main difference between our model of microservices and Aeolus components lies in the specification of their deployment life cycle. Here, instead of using the full power of finite state automata (like in Aeolus and other TOSCA-compliant deployment models [10]), we assume microservices to have two states: (i) creation and (ii) binding/unbinding. Concerning creation, we use *strong* dependencies to express which microservices must be immediately connected to newly created ones. After creation, we use *weak* dependencies to indicate additional microservices that can be bound/unbound. The principle that guided this modification comes from state-of-the-art microservice deployment technologies like Docker [36] and Kubernetes [29]. In particular, the weak and strong dependencies have been inspired by Docker Compose [16] (a language for defining multi-container Docker applications) where it is possible to specify different relationships among microservices using, e.g., the depends on (resp. external links) modalities that force (resp. do not force) a specific startup order similarly to our strong (resp. weak) dependencies. Weak dependencies are also useful to model horizontal scaling, e.g., a load balancer that is bound to/unbound from many microservice instances during its life cycle.

In addition, w.r.t. the Aeolus model, we also consider resource/cost-aware deployments, taking inspiration from the memory and CPU resources found in Kubernetes. Microservice specifications are enriched with the amount of resources they need to run. In a deployment, a system of microservices runs within a set of computation *nodes*. Nodes represent computational units (e.g., virtual machines in an Infrastructure-as-a-Service Cloud deployment). Each node has a cost and a set of resources available to the microservices it hosts.

On the model above, we define the *optimal deployment problem* as follows: given an initial microservice system, a set of available nodes, and a new target microservice to be deployed, find a sequence of reconfiguration actions that, once applied to the initial system, leads to a new deployment that includes the target microservice. Such a deployment is expected to be *optimal*, meaning that the total cost (i.e., the sum of the costs) of the nodes used is minimal. We show that this problem is decidable by presenting an algorithm working in three phases: (1) generate a set of constraints whose solution indicates the microservices to be deployed and their distribution over the nodes; (2) generate another set of constraints whose solution indicates the connections to be established; (3) synthesize the corresponding deployment plan. The set of constraints includes optimization metrics that minimize the overall cost of the computed deployment.

**Fig. 1.** Example of microservice deployment (blue boxes: nodes; green boxes: microservices; continuous lines: the initial configuration; dashed lines: full configuration). (Color figure online)

The algorithm has NEXPTIME complexity because, in the worst-case, the length of the deployment plan could be exponential in the size of the input. However, we consider this worst-case unfeasible in practice, as the number of microservices deployable on one node is limited by the available resources. Under the assumption that each node can host at most a polynomial amount of microservices, the deployment problem is NP-complete and the problem of deploying a system minimizing its total cost is an NP-optimization problem. Moreover, having reduced the deployment problem in terms of constraints, we can exploit state-of-the-art constraint solvers [12,23,24] that are frequently used in practice to cope with NP-hard problems.

To concretely evaluate our approach, we consider a real-world microservice architecture, inspired by the reference email processing pipeline from Iron.io [22]. We model that architecture in the Abstract Behavioral Specification (ABS) language, a high-level object-oriented language that supports deployment modeling [31]. We use our technique to compute two types of deployments: an initial one, with one instance for each microservice, and a set of deployments to horizontally scale the system depending on small, medium or large increments in the number of emails to be processed. The experimental results are encouraging in that we were able to compute deployment plans that add more than 30 new microservice instances, assuming availability of hundreds of machines of three different types, and guaranteeing optimality.

## **2 The Microservice Optimal Deployment Problem**

We model microservice systems as aggregations of components with ports. Each port exposes provided and required interfaces. Interfaces describe offered and required functionalities. Microservices are connected by means of bindings indicating which port provides the functionality required by another port. As discussed in the Introduction, we consider two kinds of requirements: strong required interfaces, that need to be already fulfilled when the microservice is created, and weak required interfaces, that must be fulfilled at the end of a deployment (or reconfiguration) plan. Microservices are enriched with the specification of the resources they need to properly run; such resources are provided to the microservices by nodes. Nodes can be seen as the unit of computation executing the tasks associated to each microservice.

As an example, in Fig. 1 we have reported the representation of the deployment of a microservice system inspired by the email processing pipeline that we will discuss in Sect. 3. Here, we consider a simplified pipeline. A Message Receiver microservice handles inbound requests, passing them to a Message Analyzer that checks the email content and sends the attachments for inspection to an Attachment Analyzer. The Message Receiver has a port with a *weak* required interface that can be fulfilled by Message Analyzer instances. This requirement is weak, meaning that the Message Receiver can be initially deployed without any connection to instances of Message Analyzer. These connections can be established afterwards and reflect the possibility to horizontally scale the application by adding/removing instances of Message Analyzer. This last microservice has instead a port with a *strong* required interface that can be fulfilled by Attachment Analyzer instances. This requirement is strong to reflect the need to immediately connect a Message Analyzer to its Attachment Analyzer.

Figure 1 presents a reconfiguration that, starting from the initial deployment depicted in continuous lines, adds the elements depicted with dashed lines. Namely, a couple of new instances of Message Analyzer and a new instance of Attachment Analyzer are deployed. This is done in order to satisfy numerical constraints associated to both required and provided interfaces. For required interfaces, the numerical constraints indicate lower bounds to the outgoing bindings, while for provided interfaces they specify upper bounds to the incoming connections. Notice that the constraint ≥ 3 associated to the weak required interface of Message Receiver is not initially satisfied; this is not problematic because constraints on weak interfaces are relevant only at the end of a reconfiguration. In the final deployment, such a constraint is satisfied thanks to the two new instances of Message Analyzer. These two instances need to be immediately connected to an Attachment Analyzer: only one of them can use the initially available Attachment Analyzer, because of the constraint ≤ 2 associated to the corresponding provided interface. Hence, a new instance of Attachment Analyzer is added.

We also model resources: each microservice has associated resources that it consumes (see the CPU and RAM quantities associated to the microservices in Fig. 1). Resources are provided by nodes, that we represent as containers for the microservice instances, providing them the resources they require. Notice that nodes have also costs: the total cost of a deployment is the sum of the costs of the used nodes (e.g., in the example the total cost is 598 cents per hour, corresponding to the cost of 4 nodes: 2 C4 large and 2 C4 xlarge virtual machine instances of the Amazon public Cloud).

We now move to the formal definitions. We assume the following disjoint sets: I for interfaces, Z for microservices, and a finite set R for kinds of resources. We use <sup>N</sup> to denote natural numbers, <sup>N</sup><sup>+</sup> for <sup>N</sup> \ {0}, and <sup>N</sup><sup>+</sup> <sup>∞</sup> for <sup>N</sup><sup>+</sup> ∪ {∞}.

**Definition 1 (Microservice type).** *The set* Γ *of* microservice types*, ranged over by* T1, T2,...*, contains 5-ples* P,Ds, Dw, C, R *where:*


*We assume sets* dom(Ds)*,* dom(Dw) *and* C *to be pairwise disjoint.*<sup>1</sup>

*Notation*: given a microservice type T = P,Ds, Dw, C, R, we use the following postfix projections .prov, .reqs, .reqw, .conf and .res to decompose it; e.g., <sup>T</sup> .reqw returns the partial function associating arities to weak required interfaces. In our example, for instance, the Message Receiver microservice type is such that Message Receiver.reqw(MA) = 3 and Message Receiver.res(RAM) = 4. When the numerical constraints are not explicitly indicated, we assume as default value ∞ for provided interfaces (i.e., they can satisfy an unlimited amount of ports requiring the same interface) and 1 for required interfaces (i.e., one connection with a port providing the same interface is sufficient).

Inspired by [14], we allow a microservice to specify a conflicting interface that, intuitively, forbids the deployment of other microservices providing the same interface. Conflicting interfaces can be used to express conflicts among microservices, preventing both of them to be present at the same time, or cases in which only one microservice instance can be deployed (e.g., a consistent and available microservice that can not be replicated).

Since the requirements associated with strong interfaces must be immediately satisfied, it is possible to deploy a configuration with circular dependencies only if at least one weak required interface is involved in the cycle. In fact, having a cycle with only strong required interfaces would mean to deploy all the microservices involved in the cycle simultaneously. We now formalize a well-formedness condition on microservice types to guarantee the absence of such configurations.

**Definition 2 (Well-formed Universe).** *Given a finite set of microservice types* U *(that we also call* universe*), the strong dependency graph of* U *is as follows:* G(U)=(U, V ) *with* V = {(T , T )|T , T ∈ U ∧ ∃p ∈ I.p ∈ dom(<sup>T</sup> .reqs) <sup>∩</sup> dom(<sup>T</sup> .prov)}*. The universe* <sup>U</sup> *is well-formed if* <sup>G</sup>(U) *is acyclic.*

<sup>1</sup> Given a partial function *f*, we use dom(*f*) to denote the domain of *f*, i.e., the set {*e* | ∃*e*- : (*e, e*- ) ∈ *f*}.

In the following, we always assume universes to be well-formed. Well-formedness does not prevent the specification of microservice systems with circular dependencies, which are captured by cycles with at least one weak required interface.

**Definition 3 (Nodes).** *The set* N *of* nodes *is ranged over by* o1, o2,... *We assume the following information to be associated to each node* o *in* N *.*


As example, in Fig. 1, the node Node1 large is such that Node1 large.res(RAM) = 4 and Node1 large.cost = 100.

We now define configurations that describe systems composed of microservice instances and bindings that interconnect them. A configuration, ranged over by C1, C2,..., is given by a set of microservice types, a set of deployed microservices (with their associated type), and a set of bindings. Formally:

**Definition 4 (Configuration).** *A* configuration C *is a 4-ple* Z, T, N, B *where:*


In our example, if we use mr to refer to the instance of Message Receiver, and ma for the initially available Message Analyzer, we will have the binding (MA, mr, ma). Moreover, concerning the microservice placement function N, we have N(mr) = Node1 large and N(ma) = Node2 xlarge.

We are now ready to formalize the notion of correctness of configuration. We first define a *provisional correctness*, considering only constraints on strong required and provided interfaces, and then we define a general notion of configuration correctness, considering also weak required interfaces and conflicts. The former is intended for transient configurations traversed during the execution of a reconfiguration, while the latter for the final configuration.

**Definition 5 (Provisionally correct configuration).** *A configuration* C = Z, T, N, B *is* provisionally correct *if, for each node* <sup>o</sup>∈ran(N)*, it holds*<sup>2</sup>

$$\forall r \in \mathcal{R}. \ o. \mathsf{res}(r) \ge \sum\_{z \in Z, N(z) = o} T(z). \mathsf{res}(r).$$

*and, for each microservice* z ∈ Z*, both following conditions hold:*

<sup>2</sup> Given a (partial) function *f*, we use ran(*f*) to denote the range of *f*, i.e., the function image set {*f*(*e*) <sup>|</sup> *<sup>e</sup>* <sup>∈</sup> dom(*f*)}.


**Definition 6 (Correct configuration).** *A configuration* C = Z, T, N, B *is* correct *if* C *is provisionally correct and, for each microservice* z ∈ Z*, both following conditions hold:*

*–* (<sup>p</sup> → <sup>n</sup>) <sup>∈</sup> <sup>T</sup>(z).reqw *implies that there exist* <sup>n</sup> *distinct microservices* z1,...,z<sup>n</sup> ∈Z\{z} *such that, for every* 1 ≤ i ≤ n*, we have* p, z, zi ∈ B*; –* <sup>p</sup>∈T(z).conf *implies that, for each* <sup>z</sup> <sup>∈</sup> <sup>Z</sup>\{z}*, we have* p /<sup>∈</sup> dom(T(z ).prov)*.*

Notice that, in the example in Fig. 1, the initial configuration (in continuous lines) is only provisionally correct in that the weak required interface MA (with arity 3) of the Message Receiver is not satisfied (because there is only one outgoing binding). The full configuration — including also the elements in dotted lines is instead correct: all the constraints associated to the interfaces are satisfied.

We now formalize how configurations evolve by means of atomic actions.

## **Definition 7 (Actions).** *The set* A *contains the following actions:*


In our example, assuming that the initially available Attachment Analyzer is named aa, we have that the action to create the initial instance of Message Analyzer is *new*(ma, MessageAnalyzer, Node2 xlarge,(AA → {aa})). Notice that it is necessary to establish the binding with the Attachment Analyzer because of the corresponding strong required interface.

The execution of actions can now be formalized using a labeled transition system on configurations, which uses actions as labels.

<sup>3</sup> Given sets *S* and *S* we use: 2*<sup>S</sup>* to denote the power set of *<sup>S</sup>*, i.e., the set {*S*- | *S*- ⊆ *S*}; *S* − *S*to denote set difference; and |*S*| to denote the cardinality of *S*.

**Definition 8 (Reconfigurations).** *Reconfigurations are denoted by transitions* <sup>C</sup> <sup>α</sup> −→ C *meaning that the execution of* α ∈ A *on the configuration* C *produces a new configuration* C *. The transitions from a configuration* C = Z, T, N, B *are defined as follows:*

	- <sup>C</sup> *del*(*z*) −−−−→ *Z*\{*z*}*, T*- *, N*- *, B*- *if T*- = {(*z*- → T ) ∈ *T* | *z* = *z*- } *and N*- = {(*z*- → *o*) ∈ *N* | *z* = *z*- } *and B*-= {*p, z*1*, z*2 ∈ *B* | *z* ∈ {*z*1*, z*2}}

A *deployment plan* is simply a sequence of actions that transform a provisionally correct configuration (without violating provisional correctness along the way) and, finally, reach a correct configuration.

**Definition 9 (Deployment plan).** *A* deployment plan P *from a provisionally correct configuration* C<sup>0</sup> *is a sequence of actions* α1,...,α<sup>m</sup> *such that:*


*Deployment plans are also denoted with* C<sup>0</sup> <sup>α</sup><sup>1</sup> −→ C<sup>1</sup> <sup>α</sup><sup>2</sup> −→ · · · <sup>α</sup>*<sup>m</sup>* −−→ Cm*.*

In our example, a deployment plan that reconfigures the initial provisionally correct configuration into the final correct one is as follows: a *new* action to create the new instance of Attachment Analyzer, followed by two *new* actions for the new Message Analyzers (as commented above, the connection with the Attachment Analyzer is part of these *new* actions), and finally two *bind* actions to connect the Message Receiver to the two new instances of Message Analyzer.

We now have all the ingredients to define the *optimal deployment problem*, that is our main concern: given a universe of microservice types, a set of available nodes and an initial configuration, we want to know whether and how it is possible to deploy at least one microservice of a given microservice type T by optimizing the overall cost of nodes hosting the deployed microservices.

**Definition 10 (Optimal deployment problem).** *The* optimal deployment problem *has, as input, a finite well-formed universe* U *of microservice types, a finite set of available nodes* O*, an initial provisionally correct configuration* C<sup>0</sup> *and a microservice type* T<sup>t</sup> ∈ U*. The output is:*

	- *for all* C<sup>i</sup> = Zi, Ti, Ni, Bi*, with* 1 ≤ i ≤ m*, it holds* ∀z ∈ Zi. Ti(z) ∈ U ∧ Ni(z) ∈ O*, and*
	- C<sup>m</sup> = Zm, Tm, Nm, Bm *satisfies* ∃z ∈ Z<sup>m</sup> : Ti(z) = Tt*;*

*if there exists one. In particular, among all deployment plans satisfying the constraints above, one that minimizes* <sup>o</sup>∈O.(∃z.N*m*(z)=o) o.cost *(i.e., the overall cost of nodes in the last configuration* Cm*), is outputted.*

*–* **no** *(stating that no such plan exists); otherwise.*

We are finally ready to state our main result on the decidability of the optimal deployment problem. To prove the result we describe an approach that splits the problem in three incremental phases: (1) the first phase checks if there is a possible solution and assigns microservices to deployment nodes, (2) the intermediate phase computes how the microservices need to be connected to each other, and (3) the final phase synthesizes the corresponding deployment plan.

#### **Theorem 1.** *The optimal deployment problem is decidable.*

*Proof.* The proof is in the form of an algorithm that solves the optimal deployment problem. We assume that the input to the problem to be solved is given by U (the microservice types), O (the set of available nodes), C<sup>0</sup> (the initial provisionally correct configuration), and T<sup>t</sup> ∈ U (the target microservice type). We use I(U) to denote the set of interfaces used in the considered microservice types, namely I(U) = T ∈<sup>U</sup> dom(<sup>T</sup> .reqs) <sup>∪</sup> dom(<sup>T</sup> .reqw) <sup>∪</sup> dom(<sup>T</sup> .prov) ∪ T .conf. The algorithm is based on three phases.

*Phase 1* The first phase consists of the generation of a set of constraints that, once solved, indicates how many instances should be created for each microservice type T (denoted with inst(T )), how many of them should be deployed on node o (denoted with inst(T , o)), and how many bindings should be established for each interface p from instances of type T — considering both weak and strong required interfaces — and instances of type T (denoted with bind(p, T , T )). We also generate an optimization function that guarantees that the generated configuration is minimal w.r.t. its total cost.

We now incrementally report the generated constraints. The first group of constraints deals with the number of bindings:

$$\bigwedge\_{p \in \mathcal{I}(U)} \bigwedge\_{\substack{T \in U, \, p \in \mathsf{dom}(\mathcal{T}.\mathsf{req}\_{\mathsf{B}}) }} \mathcal{T}.\mathsf{req}\_{\mathsf{B}}(p) \cdot \mathsf{inst}(T) \le \sum\_{\mathcal{T}' \in U} \mathsf{bind}(p, \mathcal{T}, T') \qquad (1a)$$

$$\bigwedge\_{p \in \mathbb{Z}(U)} \bigwedge\_{\substack{T \in U, \, p \in \mathsf{dom}(T.\textbf{req}\mu) }} \mathcal{T}.\mathtt{req}\_{\mathsf{F}}(p) \cdot \mathtt{inst}(T) \le \sum\_{\mathcal{T}' \in U} \mathtt{bind}(p, \mathcal{T}, \mathcal{T}') \tag{1b}$$

$$\bigwedge\_{p \in \mathcal{L}(U)} \bigwedge\_{\substack{T \in U, \, T.\mathbf{prov}(p) < \infty}} \mathcal{T}.\mathbf{prov}(p) \cdot \mathbf{inst}(T) \ge \sum\_{\mathcal{T}' \in U} \mathbf{bind}(p, \mathcal{T}', T) \tag{1c}$$

$$\bigwedge\_{p \in \mathcal{I}(U)} \bigwedge\_{\substack{T \in U, \, T.\textbf{Proof}(p) = \infty}} \mathsf{inst}(T) = 0 \quad \Rightarrow \sum\_{\substack{T' \in U}} \mathsf{bind}(p, T', T) = 0 \tag{1d}$$

$$\bigwedge\_{p \in \mathbb{Z}(U)} \bigwedge\_{\substack{T \in U, \, p \notin \mathbf{dom}(T, \mathbf{prov}) \\ \mathbf{P}' \in U}} \sum\_{T' \in U} \mathbf{bind}(p, T', T) = 0 \tag{1e}$$

Constraint 1a and 1b guarantee that there are enough bindings to satisfy all the required interfaces, considering both strong and weak requirements. Symmetrically, constraint 1c guarantees that the number of bindings is not greater than the total available capacity, computed as the sum of the single capacities of each provided interface. In case the capacity is unbounded (i.e., ∞), it is sufficient to have at least one instance that activates such port to support any possible requirement (see constraint 1d). Finally, constraint 1e guarantees that no binding is established connected to provided interfaces of microservice types that are not deployed.

The second group of constraints deals with the number of instances of microservices to be deployed.

$$\mathsf{inst}(T\_t) \ge 1 \tag{2a}$$

$$\bigwedge\_{p \in \mathbb{Z}(U)} \bigwedge\_{\substack{T \in U, \\ p \in \mathcal{T}. \mathsf{conf}}} \bigwedge\_{\substack{\mathcal{T} \in U - \{T\}, \\ p \in \mathsf{dom}(T', \mathsf{Prov})}} \mathsf{insst}(T) > 0 \implies \mathsf{insst}(T') = 0 \tag{2b}$$

$$\bigwedge\_{p \in \mathcal{I}(U)} \bigwedge\_{\substack{T \in U, \, p \in T. \mathsf{conf.} \\ p \in \mathsf{dom}(\mathcal{T}. \mathsf{Prov})}} \mathsf{inst}(T) \le 1 \tag{2c}$$

$$\bigwedge\_{p \in \mathcal{Z}(U)} \bigwedge\_{\mathcal{T} \in U} \bigwedge\_{\substack{\mathcal{T}' \in U - \{\mathcal{T}\}}} \mathsf{bind}(p, \mathcal{T}, \mathcal{T}') \le \mathsf{inst}(\mathcal{T}) \cdot \mathsf{inst}(\mathcal{T}') \tag{2d}$$

$$\bigwedge\_{p \in \mathcal{Z}(U)} \bigwedge\_{\mathcal{T} \in U} \mathsf{bind}(p, \mathcal{T}, T) \le \mathsf{inst}(\mathcal{T}) \cdot (\mathsf{inst}(T) - 1) \tag{2e}$$

The first constraint 2a guarantees the presence of at least one instance of the target microservice. Constraint 2b guarantees that no two instances of different types will be created if one activates a conflict on an interface provided by the other one. Constraint 2c, consider the other case in which a type activates the same interface both in conflicting and provided modality: in this case, at most one instance of such type can be created. Finally, the constraints 2d and 2e guarantee that there are enough pairs of distinct instances to establish all the necessary bindings. Two distinct constraints are used: the first one deals with bindings between microservices of two different types, the second one with bindings between microservices of the same type.

The last group of constraints deals with the distribution of microservice instances over the available nodes O.

$$\mathsf{inst}(\mathcal{T}) = \sum\_{o \in O} \mathsf{inst}(\mathcal{T}, o) \tag{3a}$$

$$\bigwedge\_{r \in \mathcal{R}} \bigwedge\_{o \in O} \sum\_{T \in U} \mathsf{insert}(\mathcal{T}, o) \cdot \mathcal{T}. \mathsf{res}(r) \le o. \mathsf{res}(r) \tag{3b}$$

$$\bigwedge\_{o \in O} \left( \sum\_{\mathcal{T} \in U} \mathtt{insert}(\mathcal{T}, o) > 0 \right) \Leftrightarrow \mathtt{used}(o) \tag{3c}$$

$$\min \sum\_{o \in O, \mathsf{uastd}(o)} o.\mathsf{cost} \tag{3d}$$

Constraint 3a simply formalizes the relationship among the variables inst(T ) and inst(T , o) (the total amount of all instances of a microservice type, should correspond to the sum of the instances locally deployed on each node). Constraint 3b checks that each node has enough resources to satisfy the requirements of all the hosted microservices. The last two constraints define the optimization function used to minimize the total cost: constraint 3c introduces the boolean variable used(o) which is true if and only if node o contains at least one microservice instance; constraint 3d is the function to be minimized, i.e., the sum of the costs of the used nodes.

These constraints, and the optimization function, are expected to be given in input to a constraint/optimization solver. If a solution is not found it is not possible to deploy the required microservice system; otherwise, the next phases of the algorithm are executed to synthesize the optimal deployment plan.

*Phase 2* The second phase consists of the generation of another set of constraints that, once solved, indicates the bindings to be established between any pair of microservices to be deployed. More precisely, for each type T such that inst(T ) > 0, we use s<sup>T</sup> <sup>i</sup> , with 1 ≤ i ≤ inst(T ), to identify the microservices of type T to be deployed. We also assume a function N that associates microservices to available nodes O, which is compliant with the values inst(T , o) already computed in Phase 1, i.e., given a type T and a node o, the number of s<sup>T</sup> <sup>i</sup> , with 1 ≤ i ≤ inst(T ), such that N(s<sup>T</sup> <sup>i</sup> ) = o coincides with inst(T , o).

In the constraints below we use the variables b(p, s<sup>T</sup> <sup>i</sup> , s<sup>T</sup> - <sup>j</sup> ) (with i = j, if T = T ): its value is 1 if there is a connection between the required interface p of s<sup>T</sup> <sup>i</sup> and the provided interface p of s<sup>T</sup> - <sup>j</sup> , 0 otherwise. We use n and m to denote inst(T ) and inst(T ), respectively, and an auxiliary total function *limProv*(T , *p*) that extends T .prov associating 0 to interfaces outside its domain.

$$\bigwedge\_{T \in U} \bigwedge\_{p \in \mathbb{Z}(U)} \bigwedge\_{i \in 1..n} \sum\_{j \in \{1..m\} \backslash \{i \mid T = T'\}} \mathsf{b}(p, s\_i^T, s\_j^{T'}) \le \dim \mathrm{Prov}(\mathcal{T}', p) \tag{4a}$$

$$\bigwedge\_{T \in U} \bigwedge\_{p \in \mathsf{dom}(T.\mathsf{regs})} \bigwedge\_{i \in 1...n} \sum\_{j \in \{1...m\} \backslash \{i \mid T = T'\}} \mathsf{b}(p, s\_i^T, s\_j^{T'}) \ge T.\mathsf{reeqs}(p) \tag{4b}$$

$$\bigwedge\_{T \in U} \bigwedge\_{p \in \mathsf{dom}(T.\mathsf{req}\mathfrak{g})} \bigwedge\_{i \in 1...n} \sum\_{j \in \{1...m\} \backslash \{i \mid T = T'\}} \mathsf{b}(p, s\_i^T, s\_j^{T'}) \ge T. \mathsf{req}\mathfrak{w}(p) \tag{4c}$$

$$\bigwedge\_{T \in U} \bigwedge\_{p \notin \mathsf{dom}(T.\mathsf{reqg}) \cup \mathsf{dom}(T.\mathsf{reqg})} \bigwedge\_{i \in 1\ldots n} \sum\_{j \in \{1\ldots m\} \backslash \{i \mid T = T'\}} \mathsf{b}(p, s\_i^T, s\_j^{T'}) = 0 \qquad \text{(4d)}$$

Constraint 4a considers the provided interface capacities to fix upper bounds to the bindings to be established, while constraints 4b and 4c fix lower bounds based on the required interface capacities, considering both the weak (see 4b) and the strong (see 4c) ones. Finally, constraint 4d indicates that it is not possible to establish connections on interfaces that are not required.

A solution for these constraints exists because, as also shown in [13], the constraints 1a ... 2e (already solved during Phase 1) guarantee that the configuration to be synthesized contains enough capacity on the provided interfaces to satisfy all the required interfaces.

*Phase 3* In this last phase we synthesize the deployment plan that, when applied to the initial configuration C0, reaches a new configuration C<sup>t</sup> with nodes, microservices and bindings as computed in the first two phases of the algorithm. Without loss of generality, in this decidability proof we show the existence of a simple plan that first removes the elements in the initial configuration and then deploys the target configuration from scratch. However, as also discussed in Sect. 3, in practice it is possible to define more complex planning mechanisms that re-use microservices already deployed.

Reaching an empty configuration is a trivial task since it is always possible to perform in the initial configuration unbind actions for all the bindings connected to weak required interfaces. Then, the microservices can be safely deleted. Thanks to the well-formedness assumption (Definition 2) and using a topological sort, it is possible to order the microservices to be removed without violating any strong required interface (e.g., first remove the microservice not requiring anything and repeat until all the microservices have been deleted).

The deployment of the target configuration follows a similar pattern. Given the distribution of microservices over nodes (computed in the first phase) and the corresponding bindings (computed in the second phase), the microservices can be created by following a topological sort considering the microservices dependencies following from the strong required interfaces. When all the microservices are deployed on the corresponding nodes, the remaining bindings (on weak required ports) may be added in any possible order.

*Remark 1.* The constraints generated during Phase 2 of the algorithm, in order to establish the microservice bindings, are expected to be given in input to a constraint/optimization solver. One can enrich such constraints with metrics to optimize, e.g., the number of local bindings (i.e., give a preference to the connections among microservices hosted in the same node):

$$\min \sum\_{T, T' \in U, i \in 1..\ldots \text{in } \mathfrak{st}(T), j \in 1..\ldots \text{in } \mathfrak{st}(T'), p \in \mathcal{Z}(U), N(s\_i^T) \neq N(s\_j^{T'})} \mathsf{b}(p, s\_i^T, s\_j^{T'}) $$

Another example, used in the case study discussed in Sect. 3, is the following metric that maximizes the number of bindings<sup>4</sup>:

$$\max \sum\_{s\_i^{T}, s\_j^{T'}, p \in \mathcal{I}(U)} \mathsf{b}(p, s\_i^{T}, s\_j^{T'}) $$

From the complexity point of view, it is possible to show that the decision versions of the optimization problem solved in Phase 1 is NP-complete, in Phase

<sup>4</sup> We model a load balancer as a microservice having a weak required interface, with arity 0, that can be provided by its back-end service. By adopting the above maximization metric, the synthesized configuration connects all possible services to such required interface, thus allowing the load balancer to forward requests to all of them.

**Fig. 2.** Microservice architecture for email processing.

2 is in NP, while the planning in Phase 3 is synthesized in polynomial time. Unfortunately, due to the fact that numeric constraints can be represented in log space, the output of Phase 2 requiring the enumeration of all the microservices to deploy can be exponential in the size of the output of Phase 1 (indicating only the total number of instances for each type). For this reason, the optimal deployment problem is in NEXPTIME. However, we consider unfeasible in practice the deployment of an exponential number of microservices on one node having limited resources. If at most a polynomial number of microservices can be deployed on each node, we have that the optimal deployment problem becomes an NP-optimization problem and its decision version is NP-complete. See the companion technical report [8] for the formal proofs of complexity.

## **3 Application of the Technique to the Case-Study**

Given the asymptotic complexity of our solution (NP under the assumption of polynomial size of the target configuration) we have decided to evaluate its applicability in practice by considering a real-world microservice architecture, namely the email processing pipeline described in [22]. The considered architecture separates and routes the components found in an email (headers, links, text, attachments) into distinct, parallel sub-pipelines with specific tasks (e.g., remove malicious attachments, tag the content of the mail). We report in Fig. 2 a depiction of the architecture. When an email reaches the Message Receiver it is forwarded to the Message Parser, which sends each component into a specific sub-pipeline. In the sub-pipelines, some microservices — e.g., Text Analyzer and Attachment Analyzer — coordinate with other microservices — e.g., Sentiment Analyzer and Virus Scanner — to process their inputs. Each microservice in the architecture has a given resource consumption (expressed in terms of CPU and memory). As expected, the processing of each email component entails a specific load. Some microservices can handle large inputs, e.g., in the range of 40K simultaneous requests (e.g., Header Analyzer that processes short and uniform inputs). Other microservices sustain heavier computations (e.g., Image Recognizer) and can handle smaller simultaneous inputs, e.g., in the range of 10K requests.

To model the system above, we use the Abstract Behavioral Specification (ABS) language, a high-level object-oriented language that supports deployment modeling [31]. ABS is agnostic w.r.t. deployment platforms (Amazon AWS, Microsoft Azure) and technologies (e.g., Docker or Kubernetes) and it offers high-level deployment primitives for the creation of new *deployment components* and the instantiation of objects inside them. Here, we use ABS deployment components as computation nodes, ABS objects as microservice instances, and ABS object references as bindings. Finally, to describe the requirements in our model, we use ABS with SmartDepl [25], an extension that supports deployment annotations. Strong required interfaces are modeled as class annotations indicating mandatory parameters for the class constructor: such parameters contain the references to the objects corresponding to the microservices providing the strongly required interfaces. Weak required interfaces are expressed as annotations concerning specific methods used to pass, to an already instantiated object, the references to the objects providing the weakly required interfaces. We define a class for each microservice type, plus one *load balancer* class for each microservice type. A load balancer distributes requests over a set of instances that can scale horizontally. Finally, we model nodes corresponding to Amazon EC2 instances: c4 large, c4 xlarge, and c4 2xlarge (with the corresponding provided resources and costs).


In the table above, we report the result of our algorithm w.r.t. four incremental deployments: the initial in column 2 and under incremental loads in 3–5. We also consider an availability of 40 nodes for each of the three node types. In the first column of the Table, next to a microservice type, we report its corresponding maximum computational load, i.e., the maximal number of simultaneous requests that it can manage. As visible in columns 2–5, different maximal computational loads imply different scaling factors w.r.t. a given number of simultaneous requests. In the initial configuration we consider 10K simultaneous requests and we have one instance of each microservice type (and of the corresponding load balancer). The other deployment configurations deal with three scenarios of horizontal scaling, assuming three increasing increments of inbound messages (20K, 50K, and 80K). In the three scaling scenarios, we do not implement the planning algorithm described in Phase 3 of the proof of Theorem 1. Contrarily, we take advantage of the presence of the load balancers and, as described in Remark 1, we achieve a similar result with an optimization function that maximizes the number of bindings of the load balancers. For every scenario, we use SmartDepl [33] to generate the ABS code for the plan that deploys an optimal configuration, setting a timeout of 30 min for the computation of every deployment scenario.<sup>5</sup> The ABS code modeling the system and the generated code are publicly available at [7]. A graphical representation of the initial configuration is available in the companion technical report [8].

## **4 Related Work and Conclusion**

In this work, we consider a fundamental building block of modern Cloud systems, microservices, and prove that the generation of a deployment plan for an architecture of microservices is decidable and fully automatable; spanning from the synthesis of the optimal configuration to the generation of the deployment actions. To illustrate our technique, we model a real-world microservice architecture in the ABS [31] language and we compute a set of deployment plans.

The context of our work regards automating Cloud application deployment, for which there exist many specification languages [5,11], reconfiguration protocols [6,19], and system management tools [26,32,37,38]. Those tools support the specification of deployment plans but they do not support the automatic distribution of software instances over the available machines. The proposals closest to ours are those by Feinerer [20] and by Fischer et al. [21]. Both proposals rely on a solver to plan deployments. The first is based on the UML component model, which includes conflicts and dependencies, but lacks the modeling of nodes. The second does not support conflicts in the specification language. Neither proposals support the computation of optimal deployments.

Three projects inspire our proposal: Aeolus [13,14], Zephyrus [1], and Conf-Solve [28]. The Aeolus model paved the way to reason on deployment and reconfiguration, proving some decidability results. Zephyrus is a configuration tool based on Aeolus and it constitutes the first phase of our approach. ConfSolve is a tool for the optimal allocation of virtual machines to servers and of applications to virtual machines. Both tools do not synthesize deployment plans.

<sup>5</sup> Here, 30 min are a reasonable timeout since we predict different system loads and we compute in advance a different deployment plan for each of them. An interesting future work would aim at shortening the computation to a few minutes (e.g., around the average start-up time of a virtual machine in a public Cloud) to obtain on-the-fly deployment plans tailored to unpredictable system loads.

Regarding autoscaling, existing solutions [2,4,17,29] support the automatic increase or decrease of the number of instances of a service/container, when some conditions (e.g., CPU average load greater than 80%) are met. Our work is an example of how we can go beyond single-component horizontal scaling policies (as analyzed, e.g., in [9]).

As future work, we want to investigate local search approaches to speed-up the solution of the optimization problems behind the computation of a deployment plan. Shorter computation times would open our approach to contexts where it is unfeasible to compute plans ahead of time, e.g., due to unpredictable loads.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **A Data Flow Model with Frequency Arithmetic**

Paul Dubrulle(B) , Christophe Gaston , Nikolai Kosmatov , Arnault Lapitre , and St´ephane Louise

CEA, List, 91191 Gif-sur-Yvette, France {paul.dubrulle,christophe.gaston,nikolai.kosmatov, arnault.lapitre,stephane.louise}@cea.fr

**Abstract.** Data flow formalisms are commonly used to model systems in order to solve problems of buffer sizing and task scheduling. A prerequisite for static analysis of a modeled system is the existence of a periodic schedule in which the sizes of communication channels can be bounded for an unbounded execution (consistency), and that communication dependencies do not introduce a deadlock in such an execution (liveness). In the context of Cyber-Physical Systems, components are often interfaced with the physical world and have frequency constraints. The existing data flow formalisms lack expressiveness to fully cover the expected behavior of these components. We propose an extension to Synchronous Data Flow (SDF) formalism, called Polygraph, that includes frequency constraints and adjustable communication rates. We show that with these extensions, the conditions for a model to be consistent and live are no longer sufficient, and we extend the corresponding theorems with necessary and sufficient conditions to preserve these properties. We also introduce a framework to check the liveness of a Polygraph model, implemented in the tool DIVERSITY, along with preliminary experiments to validate this approach.

## **1 Introduction**

*Context.* Cyber-Physical Systems (CPS) are increasingly present in everyday life. In these systems, the components require a certain amount of input data to produce a known amount of output data, and some of them must do so in synchrony with a reference time scale. For example, the next generation of autonomous vehicles will heavily rely on sensor fusion systems to operate the car. Sensors and actuators have specified frequencies. To produce its output, the fusion kernel requires a certain number of samples from several sources, with a temporal correlation between them.

Often, when implementing this kind of system, the prediction of its performance is important to the system designer. The performance prediction covers different characteristics of the system, including its throughput, memory footprint, and latency. In distributed implementations of such systems, an analysis of the communications between the components is necessary to configure a network capable to respect the application's real-time requirements.

Data flow formalisms [3,14] can be used to perform this kind of performance analysis [4,5,10–12]. A prerequisite to analyze a model is the existence of a periodic schedule with two properties. The first property, *consistency*, requires that the sizes of the communication buffers remain bounded for an unbounded execution of the periodic schedule. In practice, if a model is not consistent, it is not possible to implement the communications without losing data samples. The second property, *liveness*, requires the absence of deadlocks in the schedule.

*Motivation and Goals.* The limitation of the existing data flow formalisms to model the considered systems is the lack of expressiveness regarding the synchronization on a common time scale for different components. Overcoming this limitation is the subject of recent research work [6]. Our goal is to extend an existing data flow formalism for which the consistency and liveness properties of a given model are decidable. In doing so, we want to ensure that the expressiveness extension does not impact the decidability of these properties. With this extension, all applicative constraints are taken into account when checking the prerequisites for a performance analysis. The verification can be performed in abstraction of a particular implementation's characteristics (like execution times or mapping), and the results are the same for different implementations. Moreover, the performance analysis can benefit from the additional information on the system provided by the extension.

*Approach and Main Results.* This paper introduces Polygraph, an extension to Synchronous Data Flow (SDF) [14] for specification of frequency constraints on the components. We use an arithmetic based on rational numbers to reason on data exchanges between components. We show that the theorems that provide a theoretical foundation for practical verification of consistency and liveness for an SDF model can be generalized to this new formalism. Finally, we propose a symbolic execution framework to decide the liveness of models expressed in Polygraph, in a way similar to [11,14].

The contributions of this work include:


*Outline.* The remainder of this paper is organized as follows. Section 2 gives an informal introduction to the proposed modeling approach, with a step-by-step explanation relying on an illustrative system. In Sect. 3, we formalize Polygraph

**Fig. 1.** Motivating example: a data fusion system modeled as a data flow graph. The upper indexes "a" to "d" denote an amount of data exchanged by the components in different variants of the model. The rates denoted by upper index "d" are those of Polygraph, and initial conditions for this configuration are denoted by (i) and (ii).

and provide extended statements and a sketch of proof for the consistency and liveness theorems. Section 4 presents a framework to check the liveness property for Polygraph and a preliminary evaluation. In Sect. 5, we discuss related work, while Sect. 6 presents conclusion and perspectives.

## **2 Motivation and Running Example**

*Running Example.* To introduce the modeling approach behind Polygraph, we use a toy example of a data fusion system that could be integrated into the cockpit display of a car, depicted in Fig. 1. The system is composed of three sensors producing data samples to be used by a data fusion component, and a display component. The function of the sensor components is to read the data from their sensors, while the function of the data fusion component is to compute a result based on this data. The function of the display component is to render the fusion result on a screen. To do so, the sensor components send the data to the fusion component, and the fusion component sends the result to the display component. The first sensor component is a video camera producing frames. The other two sensor components analyze radar and lidar based samples to produce a descriptor of the closest detected obstacles. The fusion component uses this information to draw the obstacle descriptors on the corresponding frame.

The first step to model this system is to build a graph capturing data dependencies between the components. Each vertex of this graph models an *actor*, an abstract entity representing the function of a component. Each directed edge of the graph models a communication *channel*, the source actor being the producer of data consumed by the destination actor. The structure of the graph in Fig. 1 illustrates the dependencies in our example. The communication policy on the channels is First-In First-Out (FIFO), the write operation is non-blocking, and the read operation is blocking. On each channel, the atomic amount of data exchanged by the connected actors is called a *token*, and all write and read operations are measured in tokens. An actor *produces* (resp. *consumes*) a certain number of tokens on a channel when it writes (resp. reads) the corresponding amount of data. With this policy, the graph can be assimilated to a Kahn Process Network (KPN) [13]. In a KPN, the communications are determinate, but in general it is not possible to decide if the sizes of the channels can be bounded for an unbounded execution of the system.

*Synchronous and Asynchronous Constraints.* In practice, sensors and actuators have a fixed sampling rate, and the production of each data sample occurs at that specified frequency. To model these constraints, we propose to label some actors with *frequencies*, corresponding to the real-life constraint. An actor with a frequency label must *fire* at that frequency. We further detail this notion of firing below, but for now it is sufficient to say that the firing of an actor is an atomic process, during which it performs the actions and communications expected from the modeled component. A global clock provides ticks to synchronize the firing of frequency labeled actors. For our example, we consider the frequency labeling illustrated by Fig. 1.

Generally, in real-life systems, computation kernels compute when input data is available and do not have frequency constraints. In our frequency labeling, the actors modeling such components can be left without a frequency label. In our example, this is the case for the fusion actor.

The possibility to have unlabeled actors is an important part of our approach, as further discussed in Sect. 5. It allows to mix a synchronous firing policy for labeled actors, and an asynchronous firing policy for unlabeled actors. This means that the scheduling of firings has periodic constraints only where needed, which offers more options for optimization algorithms.

*Static Rates.* Another characteristic of real-life software components in our context is that they require a fixed number of input samples from each different source. Also, there must be a correlation between the production time of the samples consumed from different sources. In our example, the fusion component requires one token from each sensor, and these samples must have a close-enough production time. This constraint can be captured by KPN restrictions, such as Synchronous Data Flow (SDF) [14]. In SDF, both ends of each channel are assigned a communication rate, denoting the fixed number of tokens produced or consumed by the connected actors' firings. This characteristic allows to decide whether the sizes of the channels are bounded for an unbounded execution. Graphs respecting this property are said to be *consistent*.

Without taking frequencies into account, the communication rates denoted by an upper index "a" in Fig. 1 match the description of the system. Indeed, the sensor actors produce one token each, the fusion actor consumes these tokens, and in turn produces one token to be consumed by the display actor. With these rates, considering a marking of the graph with any number of tokens stored in the channels, if firing all the actors once, the same number of tokens remains in the channels. Hence, the SDF graph is consistent. But when taking frequencies into account, the graph is no longer consistent. In this example, the camera produces 30 tokens per second, the radar produces 120 tokens per second, and the lidar produces 10 tokens per second. This means that per second, because of the production rate and frequency of the lidar, the fusion actor will be able to fire only 10 times. It will consume only 10 tokens from the camera and radar actors, leaving 20 and 110 unconsumed tokens per second on their respective channels. Hence, it is no longer possible to bound the size of these channels for an unbounded execution of the graph. This shows that to achieve consistency, for any frequency labeled actor, the number of asynchronous firings of its unlabeled predecessors and successors should be limited.

A possible adaptation of communication rates, denoted by upper index "b" in Fig. 1, takes frequency inheritance into account and restores the consistency property. With the production and consumption rates both set to 1 on the channel connecting the camera and the fusion actors, the fusion actor basically inherits a frequency constraint of 30 Hz. It inherits the same frequency constraint from the radar and lidar actors since it now consumes 4 × 30 = 1 × 120 tokens per second from the radar, and 1 × 30 = 3 × 10 tokens per second from the lidar. The rates on the channel connecting the fusion and display actors are also balanced. But with these rates, the number of tokens does not reflect accurately the expected behavior of the modeled components. For example, the fusion actor would consume 4 tokens per activation from the radar actor, while in reality the component only requires 1.

*Cyclo-Static Rates.* It is possible to use Cyclo-Static Data Flow (CSDF) [3] to get closer to the real communication requirements. In CSDF, the rates of the actors are fixed as in SDF, but the successive firings of an actor cyclically consume and produce a different number of tokens on every connected channel. The successive rates on each channel are expressed as a sequence of natural numbers. For example, an actor with a cyclo-static sequence of output rates [1, 2] produces 1 token for its first firing, 2 tokens for the second, 1 for the third and so on. A zero rate may occur in the sequence, meaning that the actor does not push or pull tokens on the channel for the corresponding firing.

A cyclo-static sequence is necessary on a channel if the connected actors have frequency constraints conflicting with the expected communication behavior. In this case, we propose that one of the actors must be chosen as having the reference frequency for the communication, and the other actor must adapt its communication rate to a cyclo-static sequence accordingly. Back to our example (see variant "c" in Fig. 1), the fusion actor requires one token from each sensor every firing. Since the component is synchronized on camera frames, we decide that the actor's reference frequency should be 30 Hz. In this case, the frequency constraints do not conflict with the expected communication behavior, and we

**Fig. 2.** Firings of actors of the motivating example: the firings are identified by the initial letter of the corresponding actor and the rank of the firing, arrows show data dependencies between firings, and a reference time scale constrains the firing of timed actors. The data dependencies marked by a cross in (a) introduce a causality issue.

assign production and consumption rates of 1 on the channel connecting the fusion and camera actors. Now, considering the radar actor, the fusion actor only requires 30 tokens per second out of 120. Considering this ratio, we assign the sequence [0, 0, 0, 1] as production rates for the radar actor, and the rate 1 for the fusion actor. The same logic applies for the lidar actor, the fusion actor requires 30 tokens per second, but only 10 tokens per second are produced. We then assign the cyclo-static sequence [1, 0, 0] as consumption rates for the fusion actor, and the rate 1 for the lidar actor. A similar logic is applied for the display actor. The consequence on the stream of actual data values highly depends on the implemented function, and is therefore out of the scope of the data flow modeling. In the particular case of the radar actor in our example, the software implementation could perform a downsampling of the sensed data, or just send the latest sample.

The corresponding communication rates, denoted by upper index "c" in Fig. 1, give a graph where only the required tokens are exchanged on the channels, and the consistency property is preserved. But in all generality, choosing the appropriate cyclic rate sequences for all the channels in a graph is time consuming and error prone.

*Rational Rates.* We propose instead to extend the SDF model with rational communication rates. A rational communication rate r = p/q specifies that the actor produces or consumes p tokens every q firings, and the natural number of tokens produced or consumed by any firing is r rounded either up or down, denoted r and r respectively. With the semantic formalized in the next section, there is a unique default cyclo-static sequence that corresponds to a given rational rate. The default sequences for the rates denoted by an upper index "d" in Fig. 1 are those denoted by upper index "c". As explained earlier when assigning cyclostatic sequences, in this extension, only one rate on a given channel can be a rational number with denominator greater than one. The methodology remains the same, for any channel, one actor's frequency is considered as a reference, and the other one adapts its rates according to that reference.

*Initial Conditions.* With the frequency labeling and rational communication rates, we obtain a model that describes as closely as possible the communication and timing requirements of our illustrative example. But there are causality issues in this model. Figure 2(a) illustrates the timing of actor firings in our example, and the data dependencies between them, according to the semantic defined in the next section. It is obvious that the data dependencies marked by a cross are not satisfied in time.

This kind of causality issue can also appear in SDF: in the case of cyclic graphs, the firings of the actors in a cycle all depend on each other. To prevent this, it is possible to *mark* the channels with an initial number of tokens, allowing sufficient initial firings to complete the firing of all actors in the cycle. The liveness property of an SDF graph is verified when all the cycles in the graph are marked with enough tokens to prevent a deadlock [14]. With the SDF extensions we propose, this condition is no longer sufficient. We need to be able to shift the production or consumption of tokens in order to make sure that when a firing requires input tokens, they are produced at an earlier tick of the global clock.

One way to achieve this is to rotate the default sequences defined by the rational rates. For this, we propose a rational initial marking of the graph. Each channel with natural rates at both ends can be marked with an initial number of tokens as in SDF. Each other channel with rational rate r = p/q on either end can be initially marked with a rational number n + k/q with k<q, which denotes that the channel initially holds n tokens (as in SDF), and the default sequence is rotated by k. If the rational rate is on the producer, the default sequence is rotated left, otherwise it is rotated right. In Fig. 1, considering the default sequences denoted by "c", the corresponding rational rates denoted by upper index "d", and the initial marking (ii), the marking of 3/4 on the channel connecting the radar and fusion actors rotates the default sequence [0, 0, 0, 1] by 3 elements to the right, yielding the sequence [1, 0, 0, 0].

Another way to prevent unsatisfied data dependencies is to shift the first tick on which a frequency labeled actor must fire. We propose to add a *phase* to each of these actors, giving the offset from the first tick at which it must fire. With the semantic formalized in the next section, that phase is constrained in order to have a periodic global clock. Figure 2(b) takes into account the marking and phase denoted (ii) in Fig. 1. With the rational marking, the dependencies between the radar and fusion firings are now satisfied, and with the phase on the display actor, the dependencies between the camera and display firings are also satisfied.

## **3 Formalization of the Polygraph Model**

We denote by <sup>B</sup> the set {0, <sup>1</sup>}, by <sup>Z</sup> the set of integers, by <sup>N</sup> <sup>=</sup> {<sup>n</sup> <sup>∈</sup> <sup>Z</sup><sup>|</sup> <sup>n</sup> - 0} the set of natural integers, and by Q the set of rational numbers. For any set S, the free semigroup on S is denoted S<sup>+</sup>.

*System graph.* A *system graph* is a structure used to represent the topology of the communications. Formally, it is a connected finite directed graph G = (V,E) with set of vertices V and set of edges E ⊆ V × V such that V is the set of *actors* and E is the set of *channels*. We use an index notation to identify elements with respect to a given actor or channel, considering that E and V are sets indexed respectively in {1, ··· , |E|} and {1, ··· , |V |}. We denote v<sup>i</sup> (resp. e<sup>j</sup> ) the actor (resp. channel) of index i (resp. j). For an actor v ∈ V , let in(v) = {v , v ∈ E | v ∈ V } denote the set of *input channels* of v and out(v) = {v, v ∈ E | v ∈ V } the set of *output channels* of v.

*Topology matrix and channel states.* As for SDF and its derivations [3,14], the communication rates are defined by a topology matrix with one row per channel and one column per actor. The only difference in this definition is that we rely on rational numbers. The absolute value of a rate in the matrix defines how many tokens are produced or consumed per firing of the corresponding actor on the corresponding channel, and the sign of that rate indicates if the tokens are produced (positive rate) or consumed (negative rate). For a given actor and channel, the rate must be 0 if the actor is not connected to the channel, or if the actor is connected to both ends of the channel.

**Definition 1 (Topology matrix).** *A matrix* **<sup>Γ</sup>** = (γij ) <sup>∈</sup> <sup>Q</sup>|E|×|<sup>V</sup> <sup>|</sup> *is a* topology matrix *of a system graph* G *if for every channel* e<sup>i</sup> = v<sup>j</sup> , vk ∈ E *we have:*


We also use a rational number per channel to track the communication state of the system during an execution. A channel state is a vector with one row per channel. Each coordinate in the vector tracks the respective number of firings of the connected actors, by addition of their rates when they fire, and that coordinate rounded down is the number of tokens in the channel.

**Definition 2 (Channel state).** *A vector* **<sup>c</sup>** <sup>∈</sup> <sup>Q</sup>|E|×<sup>1</sup> *is a* channel state *of a system graph* G *with topology matrix* **Γ** *if for every channel* e<sup>i</sup> = v<sup>j</sup> , vk ∈ E*, the denominator of* c<sup>i</sup> *is the maximum between the denominators of* γij *and* γik*, and* ci *is the number of tokens in the channel. We denote* <sup>C</sup> <sup>⊆</sup> <sup>Q</sup>|E|×<sup>1</sup> *the set of all these possible states.*

*Timed actors and global clock.* A subset V<sup>F</sup> ⊆ V of *timed actors* are constrained by a *frequency*, expressed as a strictly positive natural number. We use a frequency mapping <sup>ω</sup> : <sup>V</sup><sup>F</sup> −→ <sup>N</sup>><sup>0</sup> in order to map the timed actors to their frequency. There is an implicit system time unit, and each timed actor v<sup>i</sup> ∈ V<sup>F</sup> is supposed to be fired exactly ω<sup>i</sup> := ω(vi) times per system time unit. In order to have a minimal system time unit, we consider that the greatest common divisor of all the frequencies is gcd(ω[V<sup>F</sup> ]) = 1. This is not limiting, since any set of frequencies and system time unit can be adjusted to fit this constraint.

In addition, the timed actors must fire synchronously with respect to a global clock. The *resolution* of that global clock is a sufficient number of *ticks* per system time unit to associate to each tick the set of timed actors that must fire at the corresponding date. For this, we consider the ticks 0, 1,...,π − 1 per system time unit, where π is the least common multiple of all the actor frequencies π = lcm({ωi|v<sup>i</sup> ∈ V<sup>F</sup> }). Note that if V<sup>F</sup> is empty, π = 1, and the global clock does not constrain the firing of any actor.

Given a timed actor v<sup>i</sup> ∈ V<sup>F</sup> , there should be ω<sup>i</sup> out of π ticks associated with that actor's firings. To reflect the periodic nature of the firing of timed actors, for a timed actor v<sup>i</sup> of period p<sup>i</sup> = π/ωi, it fires every pi-th tick.

As mentioned in Sect. 2, all the timed actors have a *phase*. We use a phase mapping <sup>ϕ</sup> : <sup>V</sup><sup>F</sup> −→ <sup>N</sup> to map the timed actors to their phase. The first firing of each timed actor v<sup>i</sup> ∈ V<sup>F</sup> occurs at the tick ϕ<sup>i</sup> := ϕ(vi). The only constraint to respect the expected frequency of the firings is that ∀v<sup>i</sup> ∈ V<sup>F</sup> we have 0 ϕ<sup>i</sup> < π/ωi.

**Definition 3 (Global clock, firing ticks).** *For a system graph* G *with frequency mapping* ω*, resolution* π*, and phase mapping* ϕ*, the* global clock *is a set* T = {0, 1,...,π −1} *and for each timed actor* v<sup>i</sup> ∈ V<sup>F</sup> *there is a subset of* firing ticks T<sup>i</sup> = {τ ∈ T | τ ≡ ϕ<sup>i</sup> (mod π/ωi)}*.*

*Polygraphs.* We now define the notion of *polygraph* which introduces a basic communication topology, a topology matrix, a frequency and phase mapping for all timed actors, and an initial marking of the graph.

**Definition 4 (Polygraph, initial marking).** *A* polygraph *is a tuple* P = G,**Γ**, ω, ϕ, **m** *where* G *is a system graph,* **Γ** *is a topology matrix,* ω *is a frequency mapping,* ϕ *is a phase mapping and* **m** ∈ C *is an* initial marking *such that* ∀e<sup>i</sup> ∈ E *we have* m<sup>i</sup> -0*.*

In the following, we consider that a polygraph P = G,**Γ**, ω, ϕ, **m** is given, with its global clock T and sets of firing ticks T<sup>i</sup> for all the timed actors v<sup>i</sup> ∈ V<sup>F</sup> .

*States and transitions.* The state of a polygraph is composed of a channel state, the current tick of the global clock, and a vector with one row per actor used to track the number of firings of the timed actors since the last change in the current tick. This *tracking vector* is used to check that the timed actors respect their synchronous firing constraints.

**Definition 5 (State).** *A* state *of a polygraph* P *is a tuple* s = **c**,τ, **a** *where* **<sup>c</sup>** <sup>∈</sup> <sup>C</sup> *is a channel state,* <sup>τ</sup> <sup>∈</sup> <sup>T</sup> *is a tick, and* **<sup>a</sup>** <sup>∈</sup> <sup>N</sup>|<sup>V</sup> |×<sup>1</sup> *is a tracking vector. We denote* <sup>S</sup> <sup>⊆</sup> <sup>C</sup> <sup>×</sup> <sup>T</sup> <sup>×</sup> <sup>N</sup>|<sup>V</sup> |×<sup>1</sup> *the set of all possible states for* <sup>P</sup>*.*

The effect of the firing of an actor on the channel state is to add its rates to the respective coordinate of all the channels. For an actor vi, the i-th column of **Γ** gives all the rates per channel. Therefore, to extract that column from the matrix for each actor <sup>v</sup><sup>i</sup> <sup>∈</sup> <sup>V</sup> , we use a *unitary firing vector* **<sup>u</sup>** <sup>∈</sup> <sup>B</sup>|<sup>V</sup> |×<sup>1</sup>, such that u<sup>i</sup> = 1, and for all j <sup>=</sup> <sup>i</sup> we have <sup>u</sup><sup>j</sup> = 0. We denote <sup>U</sup> <sup>⊂</sup> <sup>B</sup>|<sup>V</sup> |×<sup>1</sup> the set of these vectors, and for convenience we denote the unitary activation vector of actor v<sup>i</sup> by **u**i. With the unitary firing vector of any actor vi, the product **Γu**<sup>i</sup>

gives a vector holding for each channel e<sup>j</sup> the rate of v<sup>i</sup> on e<sup>j</sup> . For any channel state **c**, the channel state after the atomic firing of v<sup>i</sup> is then **c** + **Γu<sup>i</sup>** . Also, the firing of a timed actor is tracked by adding its unitary firing vector to the tracking vector. The firing of an actor has no effect on the current tick.

**Definition 6 (Fire).** *For a polygraph* P*, the mapping* fire : U × S −→ S *maps a unitary activation vector* **u**<sup>i</sup> *and a state* s = **c**,τ, **a** *to the state* s = **c** , τ , **a** *such that we have* **c** = **c**+**Γu**i*,* τ = τ *, and if* v<sup>i</sup> ∈ V<sup>F</sup> *then* **a** = **a**+**u**i*, otherwise* **a** = **a***.*

*Remark 1.* For two consecutive firings of any actors v<sup>i</sup> and v<sup>j</sup> from a state s = **c**,τ, **a** , the resulting state s = **c**, τ , **a** does not depend on the order of the firings, and **c** = **c** + **Γ**(**u**<sup>i</sup> + **u**<sup>j</sup> ). This property can be generalized to any finite number of consecutive firings.

The other possible transition between two states occurs when the global clock ticks. When the global clock ticks, the channel state is not changed, the current tick is adjusted, and the tracking vector is reset.

**Definition 7 (Tick).** *For a polygraph* P*, the mapping* tick : S −→ S *maps a state* s = **c**,τ, **a** *to the state* s = **c** , τ , **a** *such that we have* **c** = **c***,* τ = (τ + 1) mod π*, and* **a** = **0***.*

*Executions.* The state of P can evolve by successive application of either f ire or tick. An *execution* of P is a sequence of such applications starting from a state <sup>s</sup><sup>1</sup> <sup>∈</sup> <sup>S</sup> and leading to states <sup>e</sup> <sup>=</sup> <sup>s</sup><sup>1</sup> ··· <sup>s</sup><sup>n</sup> <sup>∈</sup> <sup>S</sup><sup>+</sup>. However, with the frequency constraints, there are some conditions for the applications.

Consider the firing fire(**u**i, s) of a timed actor v<sup>i</sup> in a state s = **c**,τ, **a** . In this case, v<sup>i</sup> may fire only if the current tick τ is one of its firing ticks, *i.e.* τ ∈ Ti. Since it must fire exactly once on such a tick, an additional constraint to fire a timed actor v<sup>i</sup> is that it has not fired yet, *i.e.* its coordinate in the tracking vector **<sup>a</sup>** is <sup>a</sup><sup>i</sup> = 0. To capture this constraint, we define a *tick firing vector* **<sup>t</sup>**<sup>τ</sup> <sup>∈</sup> <sup>B</sup>|<sup>V</sup> |×<sup>1</sup> for each tick τ ∈ T, in which a coordinate is set to one if the corresponding actor is expected to fire at tick τ . More formally, for any v<sup>i</sup> ∈ V \ V<sup>F</sup> we have t τ <sup>i</sup> = 0, and for any v<sup>j</sup> ∈ V<sup>F</sup> we have t τ <sup>j</sup> = 1 if τ ∈ T<sup>j</sup> , and t τ <sup>j</sup> = 0 otherwise. The constraint to fire v<sup>i</sup> ∈ V<sup>F</sup> in a state with current tick τ and tracking vector **a** is then a<sup>i</sup> < t<sup>τ</sup> i .

The clock update tick(s) in a state s = **c**,τ, **a** is also subject to a constraint: the timed actors that were supposed to fire synchronously with the current tick have done so exactly once, *i.e.* **a** = **t**<sup>τ</sup> .

**Definition 8 (Synchronous execution).** *An execution* <sup>e</sup> <sup>=</sup> <sup>s</sup><sup>1</sup> ··· <sup>s</sup><sup>n</sup> <sup>∈</sup> <sup>S</sup><sup>+</sup> *of a polygraph* P *is* synchronous *if* ∀1 k<n*, we have* s<sup>k</sup> = **c**,τ, **a** *such that:*


Until now, we considered executions of a polygraph where the order of the firings is constrained only by the frequencies. However, for an actor to fire, there must be enough tokens on its input channels, or its rational communication rate must allow firings consuming 0 tokens. In order to fire an actor v<sup>i</sup> in a state s = **c**,τ, **a** , we require that for each input channel e<sup>j</sup> of vi, since the rate γji is negative, the channel state c<sup>j</sup> must be large enough to avoid reaching a negative state, *i.e.* c<sup>j</sup> + γji - 0, or equivalently c<sup>j</sup> - |γji|. This constraint requires an ordering of the actor firings such that a producer is fired a sufficient number of times for a consumer to be able to fire in turn.

**Definition 9 (Non-blocking execution).** *An execution* <sup>e</sup> <sup>=</sup> <sup>s</sup><sup>1</sup> ··· <sup>s</sup><sup>n</sup> <sup>∈</sup> <sup>S</sup><sup>+</sup> *of a polygraph* P *is* non-blocking *if* ∀1 k<n*, we have* s<sup>k</sup> = **c**,τ, **a** *such that:*


*Consistency property.* If verified, the *consistency* property of P guarantees that it is possible to build a synchronous execution <sup>e</sup> <sup>=</sup> <sup>s</sup><sup>1</sup> ··· <sup>s</sup><sup>n</sup> <sup>∈</sup> <sup>S</sup><sup>+</sup> such that s<sup>1</sup> = **m**, 0, **0** and s<sup>1</sup> = sn. Such an execution is called a *consistent* execution of P, and can obviously be repeated an indefinite number of times to build a consistent execution of arbitrary length. [14, Theorem 1] states that a necessary and sufficient condition for a given SDF graph to be consistent is that there is a non-trivial solution **x** to **Γx** = **0**.

To extend this result to polygraphs, as explained in the previous section, we need to take into account the frequencies of the timed actors. In other words, we need to make sure that it is possible to have a synchronous execution with x<sup>i</sup> firings per actor vi. The additional constraint due to the frequencies is that the number of firings <sup>x</sup><sup>i</sup> of all the timed actors <sup>v</sup><sup>i</sup> corresponds to a number <sup>r</sup> <sup>∈</sup> <sup>N</sup> of repetitions of the global clock period.

To state the conditions for a polygraph to be consistent, we thus want to separate the number of firings of the timed actors from the others. We define the vector **t** = - <sup>∀</sup>τ∈<sup>T</sup> **<sup>t</sup>**<sup>τ</sup> giving for each timed actor <sup>v</sup><sup>i</sup> the number <sup>t</sup><sup>i</sup> of expected firings per period of the global clock. We then define the set <sup>Y</sup> <sup>⊂</sup> <sup>N</sup>|<sup>V</sup> |×<sup>1</sup> of vectors **y** such that we have a number of firings y<sup>i</sup> = 0 only for v<sup>i</sup> ∈ V \ V<sup>F</sup> .

**Theorem 1.** *A polygraph* P *has a consistent execution if and only if there exists a non-trivial solution* **<sup>x</sup>** <sup>∈</sup> <sup>N</sup>|<sup>V</sup> |×<sup>1</sup> *to* **Γx** <sup>=</sup> **<sup>0</sup>** *such that* **<sup>x</sup>** <sup>=</sup> **<sup>y</sup>**+r**<sup>t</sup>** *for some* **<sup>y</sup>** <sup>∈</sup> <sup>Y</sup> *and* <sup>r</sup> <sup>∈</sup> <sup>N</sup>*. Any such solution is called a* repetition vector *of* <sup>P</sup>*. Moreover, there exists a* minimal repetition vector **x** *such that for any other repetition vector* **x** *we have* **<sup>x</sup>** <sup>=</sup> <sup>k</sup>**<sup>x</sup>** *for some* <sup>k</sup> <sup>∈</sup> <sup>N</sup>*.*

*Sketch of proof.* First, we prove that the condition is sufficient, and suppose that there exists such a solution **x**. Then we can decompose:

$$\mathbf{x} = \mathbf{y} + \underbrace{(\mathbf{t^0} + \dots + \mathbf{t^{\pi - 1}})}\_{\mathbf{=t}} + \dots + \underbrace{(\mathbf{t^0} + \dots + \mathbf{t^{\pi - 1}})}\_{\mathbf{=t}}.$$

The required consistent execution can be obtained by constructing subexecutions corresponding to this decomposition, relying on Definition 8 and Remark 1.

*Claim (1).* There exists a synchronous execution <sup>e</sup><sup>1</sup> <sup>∈</sup> <sup>S</sup><sup>+</sup> with starting state s = **m**, 0, **0** and ending state s = **m** + **Γy**, 0, **0** .

The execution e<sup>1</sup> is constructed by applying y<sup>i</sup> firings of each actor v<sup>i</sup> ∈ V \ V<sup>F</sup> (in any order). Since the fired actors are not timed actors, any such sequence is synchronous. The resulting channel state is **m** + **Γy** as per Remark 1.

*Claim (2).* For any starting state s = **c**,τ, **0** , there exists a synchronous execution <sup>e</sup><sup>2</sup> <sup>∈</sup> <sup>S</sup><sup>+</sup> starting from <sup>s</sup> with ending state <sup>s</sup> <sup>=</sup> **<sup>c</sup>** <sup>+</sup> **Γt**<sup>τ</sup> ,(<sup>τ</sup> + 1) mod π, **<sup>0</sup>** .

The execution e<sup>2</sup> for τ is constructed by firing exactly once each timed actor supposed to do so at tick τ , and then applying the tick mapping.

*Claim (3).* For any starting state s = **c**, 0, **0** , there exists a synchronous execution <sup>e</sup><sup>3</sup> <sup>∈</sup> <sup>S</sup><sup>+</sup> starting from <sup>s</sup> with ending state <sup>s</sup> <sup>=</sup> **<sup>c</sup>** <sup>+</sup> **Γt**, <sup>0</sup>, **<sup>0</sup>** .

The execution e<sup>3</sup> is obtained by successively executing e<sup>2</sup> for τ = 0,...,π − 1.

*Claim (4).* There exists a synchronous execution <sup>e</sup><sup>4</sup> <sup>∈</sup> <sup>S</sup><sup>+</sup> with starting state s = **m**, 0, **0** and ending state s = **m** + **Γ**(**y** + r**t**), 0, **0** .

The sequence e<sup>4</sup> is constructed by executing e1, followed by e<sup>3</sup> repeated r times. Hence, given that **Γx** = **0** and **x** = **y**+r**t**, it can be easily checked that the ending state of e<sup>4</sup> is the same as its starting state, and e<sup>4</sup> is consistent. The fact that the condition is also necessary follows from the definitions. Since the current tick must return to 0 after a consistent execution, such an execution must perform a number <sup>r</sup> of periods of the global clock for some <sup>r</sup> <sup>∈</sup> <sup>N</sup>, in other words it must contain rπ applications of the tick mapping and rt<sup>i</sup> firings of each timed actor vi. The existence of a minimal solution immediately follows from the fact that in this case rank(**Γ**) = |V | − 1 according to [14, Corollary of Lemma 2].

Due to lack of space, a detailed proof is left to the reader.

*Liveness property.* If verified, the *liveness* property of P guarantees that it is possible to build a consistent execution <sup>e</sup> <sup>=</sup> <sup>s</sup><sup>1</sup> ··· <sup>s</sup><sup>n</sup> <sup>∈</sup> <sup>S</sup><sup>+</sup> such that <sup>e</sup> is also a non-blocking execution. Such an execution e is called a *live execution*.

In a way similar to [14, Theorem 3], we define the notion of a scheduler building only synchronous and non-blocking executions. Our goal is to show that P has a live execution if and only if any such scheduler can build a consistent execution.

From now on, we consider that P is consistent with minimal repetition vector **<sup>x</sup>**. We define the mapping count : <sup>V</sup> <sup>×</sup> <sup>S</sup><sup>+</sup> −→ <sup>N</sup> that given an actor <sup>v</sup><sup>i</sup> and an execution <sup>e</sup> <sup>=</sup> <sup>s</sup><sup>1</sup> ··· <sup>s</sup><sup>n</sup> <sup>∈</sup> <sup>S</sup><sup>+</sup> returns the number of firings of <sup>v</sup><sup>i</sup> in <sup>e</sup>, *i.e.* the number of k such that 1 k<n and sk+1 = fire(**u**i, sk). Notice that since a live execution e of P is also consistent, by definition we have ∀v<sup>i</sup> ∈ V, count(vi, e) = <sup>x</sup>i. Also, we say that an actor <sup>v</sup><sup>i</sup> <sup>∈</sup> <sup>V</sup> is *runnable* after an execution <sup>e</sup> <sup>∈</sup> <sup>S</sup><sup>+</sup> with ending state <sup>s</sup> if count(vi, e) < x<sup>i</sup> and the one-step execution ss <sup>∈</sup> <sup>S</sup><sup>+</sup> with s = fire(**u**i, s) is synchronous and non-blocking.

**Definition 10 (Scheduler).** *<sup>A</sup>* scheduler *of* <sup>P</sup> *is a mapping* <sup>σ</sup> : <sup>S</sup><sup>+</sup> −→ <sup>S</sup><sup>+</sup> *that maps an execution* <sup>e</sup> <sup>=</sup> <sup>s</sup><sup>1</sup> ··· <sup>s</sup><sup>n</sup> <sup>∈</sup> <sup>S</sup><sup>+</sup> *to an execution* <sup>e</sup> <sup>∈</sup> <sup>S</sup><sup>+</sup> *such that if we denote* s<sup>n</sup> = **c**,τ, **a** *we have:*


An execution defined by a scheduler σ is the fixed point constructed by recursive application<sup>1</sup> of <sup>σ</sup> starting from an initial execution <sup>e</sup> = (**m**, <sup>0</sup>, **<sup>0</sup>** ).

**Theorem 2.** *Let* P *be a consistent polygraph with minimal repetition vector* **x***,* σ *a scheduler of* P*, and* e *the execution defined by* σ*. Then* P *has a live execution if and only if* ∀v<sup>i</sup> ∈ V, count(vi, e) = xi*.*

*Sketch of proof.* The condition is obviously sufficient. The proof that it is also necessary can be easily made by induction. If e is a live execution and e is a synchronous and non-blocking execution constructed by σ so far, with |e | < |e|, we can show that e can be extended by one more step (*e.g.* by taking the first step present in e but not in e , since its preconditions are necessarily satisfied).

## **4 Tool Support for Liveness Checking**

DIVERSITY is a customizable model analysis tool based on symbolic execution, available in the *Eclipse Formal Modeling Project* [17]. DIVERSITY provides a pivot language called *xLIA* (eXecutable Language for Interaction and Architecture) introducing a set of communication and execution primitives allowing one to encode a wide class of dynamic model semantics [2,9], Communicating STS [1], and abstractions of hybrid systems [15]. In this work, we use it to analyze Polygraph models, to check their liveness in a similar way to that defined by a scheduler as per Definition 10.

The root entity in an xLIA model is a so-called *system*. A system is an executable entity that can be atomic (state-machine) or compositional or hierarchical. A Polygraph model translated to xLIA is a system where the actors are state-machines with input/output ports associated with the ends of the channels. They communicate asynchronously over FIFO queues, bounded or not, using xLIA connectors. Variables are used to store received tokens on input instructions in transitions, with guards conditioning their firing, and output statements to model their token productions.

Figure 3 represents such a state machine for any actor of the polygraph in Fig. 1. Each transition is labeled with xLIA macros representing the actions performed. The *init* macro moves the initial marking from the input queues to the

<sup>1</sup> Hence, a scheduler can be also defined as a *partial* mapping on <sup>σ</sup>∗(**m**, <sup>0</sup>, **0**).

**Fig. 3.** xLIA state machine pattern for an actor of a polygraph

counter of available input tokens, *canFire()* tests if enough tokens are present for a non-blocking firing, *consumption* decrements the counter of available input tokens, *production* sends the production rate on the successor's queue, and *reception* reads that rate and adds it to the number of available tokens. Regarding state machine semantics, all the states are pseudo-states, except *idle* which is stable. This means that any fired transition must be completed until returning to the idle state. The *else* transition will be evaluated if there is no possible *reception*.

The xLIA language allows a fine-grained definition of an execution model for the actors of a polygraph. Some instructions associate a sequence of actors to fire with each tick of a clock. When attempting to fire a timed actor, only one firing is triggered if possible, and when attempting the same for other actors, as many firings as possible are triggered. Hence, the timed actors are only fired at the expected tick, and cause a deadlock result if it's not possible. For the other actors, a counter limits their number of firings to their coordinate in the minimal repetition vector, as required by Theorem 2. With this setup, for a polygraph P with minimal repetition vector **x** = **y** + r**t**, the length of a live execution path is rπ, plus one for the initialization step handling the initial marking. Any path with less steps leads to a deadlock.

We tested this technique using DIVERSITY on an *Intel core i7*. For the polygraph of Fig. 1 with initial marking (ii), the tool finds that the liveness property is verified. We also tested the initial marking (i), and the tool correctly identified a deadlock in less than 200 ms. This example is extracted from a more complex polygraph modeling an Advanced Driver-Assistance System (ADAS), that we also used to evaluate the liveness checking tool. The considered polygraph has 18 actors (5 of which are timed actors), 32 channels (6 of which have an initial marking), where 10 actors have rational communication rates. For a correctly marked model, we find a live execution sequence in 4s.

## **5 Discussion and Related Work**

In [16], an extension to SDF is proposed to add a single throughput constraint on a channel of a consistent graph. From this constraint, a firing frequency is derived for the actors by transitivity. This approach, while preserving the consistency property by construction, does not allow the expression of a frequency constraint per actor, based on a real-life constraint on the modeled component, nor the explicit synchronization of the firings on a reference time scale.

The programming model PTIDES [18] combines a real-time semantic for sensors and actuators, and a discrete event semantic for other components like computation kernels. These other components have an awareness of the real time through a logical time abstraction. The resulting execution semantic has similarities with Polygraph, since some components are constrained by real-time and others only react to their stimuli. The semantic of PTIDES is much more flexible than Polygraph, since it does not require fixed production or consumption rates. On the other hand, and as opposed to Polygraph, there is no way to derive a consistent and live periodic schedule in PTIDES, which makes static performance prediction more difficult. Nevertheless, since the semantics are similar, we believe that the notion of logical time as defined in PTIDES is applicable to practical distributed implementations of polygraphs.

Synchronous programming languages [7,8] can be used to express a data flow between synchronous periodic nodes, in order to generate correct-by-construction programs. In these approaches, all the nodes are synchronous, while in Polygraph, some actors fire asynchronously when enabled. Also, the goal of our approach is to be able to reason formally on the modeled systems, and automate as many tasks as possible in its design, implementation and validation. Such a task could be the association of the asynchronous firings to ticks of the global clock, and the generation of a synchronous program for automatic code generation.

Recently published research [6] follows a similar approach to ours. By mixing elements from two existing formalisms, one allowing the specification of timetriggered tasks and the other the specification of data flow actors, the expressiveness of the resulting modeling framework is comparable to that of Polygraph. The main difference is that Polygraph is a single formalism with decidable properties and algorithms to check them in practice. In [6], the impact of the combination of constraints from two different formalisms on their respective properties is not discussed, as the proposed approach is more focused on the performance evaluation. The experimental results the authors obtained are in favor of the modeling approach we have in common.

## **6 Conclusion**

We have introduced Polygraph, a data flow formalism extending SDF with synchronous firing semantics for the actors. We have shown that with this extension, the existing conditions to decide of a given SDF graph's consistency and liveness were no longer sufficient. We have extended the corresponding theorems and shown that the expressiveness extensions we proposed do not impact the decidability of these properties. Finally, as a first step towards tool assisted modeling of polygraphs, we have introduced a framework relying on DIVERSITY to verify their liveness.

Our next step is to further extend Polygraph to add flexibility in the execution semantic, with the same objective to preserve the capability to perform accurate static analysis of a system's performance. Still, with this first extension, there are already interesting research perspectives regarding the applicability of existing static performance analysis techniques, and their potential extensions to take into account the specifics of a polygraph's scheduling.

**Acknowledgement.** Part of this work has been realized in the FACE project, involving CEA List and Renault. The Polygraph formalism has been used as a theoretical foundation for the software methodology in the project.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Software Testing

## CoVeriTest: Cooperative Verifier-Based Testing

Dirk Beyer and Marie-Christine Jakobs

LMU Munich, Munich, Germany

Abstract. Testing is a widely used method to assess software quality. Coverage criteria and coverage measurements are used to ensure that the constructed test suites adequately test the given software. Since manually developing such test suites is too expensive in practice, various automatic test-generation approaches were proposed. Since all approaches come with different strengths, combinations are necessary in order to achieve stronger tools. We study cooperative combinations of verification approaches for test generation, with high-level information exchange. We present CoVeriTest, a hybrid approach for test-case generation, which iteratively applies different conditional model checkers. Thereby, it allows to adjust the level of cooperation and to assign individual time budgets per verifier. In our experiments, we combine explicit-state model checking and predicate abstraction (from CPAchecker) to systematically study different CoVeriTest configurations.Moreover, CoVeriTest achieves higher coverage than state-of-the-art test-generation tools for some programs.

Keywords: Test-case generation · Software testing · Test coverage · Conditional model checking · Cooperative verification · Model checking

## 1 Introduction

Testing is a commonly used technique to measure the quality of software. Since manually creating such test suites is laborious, automatic techniques are used: e.g., model-based techniques for black-box testing and techniques based on control-flow coverage for white-box testing. Many automatic techniques have been proposed, ranging from random testing [36,57] and fuzzing [26,52,53], over search-based testing [55] to symbolic execution [23,24,58] and reachability analyses [5,12,45,46]. The latter are well-suited to find bugs and derive test suites that achieve high coverage, and several verification tools support test generation (e.g., Blast [5], PathFinder [61], CPAchecker [12]). The reachability checks for all test goals seem too expensive, but in practice, those approaches can be made pretty efficient.

Encouraged by tremendous advances in software verification [3] and a recent case study that compared model checkers with test tools w.r.t. bug finding [17], we study a new kind of combination of reachability analyses for test generation. Combinations are necessary because different analysis techniques have different strength and weaknesses. For example, consider function foo in Listing 1. Explicit state model checking [18,33] tracks the values of variables i and s and easily detects the reachability of the statements in the outermost if branch (lines 3–6), while it has difficulties with the complex condition in the else-branch (line 8). In contrast, predicate abstraction [33,39] can easily derive test values for the complex condition in line 8, but to handle the if branch (lines 3–6) it must spent effort on the detection of the predicates s = 0, s = 1, and i = 0. Independently of each

Fig. 1. Example program foo

other, test approaches [1,34,47,54] and verification approaches [9,10,29,37] employ combinations to tackle such problems. However, there are no approaches yet that combine different reachability analyses for test generation.

Inspired by abstraction-driven concolic testing [32], which interleaves concolic execution and predicate abstraction, we propose CoVeriTest, which stands for cooperative verifier-based testing. CoVeriTest iteratively executes a given sequence of reachability analyses. In each iteration, the analyses are run in sequence and each analysis is limited by its individual, but configurable time limit. Furthermore, CoVeriTest allows the analysis to share various types of analysis information, e.g., which paths are infeasible, have already been explored, or which abstraction level to use. To get access to a large set of reachability analyses, we implemented CoVeriTest in the configurable software-analysis framework CPAchecker [15]. We used our implementation to evaluate different CoVeriTest configurations on a large set of well-established benchmark programs and to compare CoVeriTest with existing state-of-the-art test-generation techniques. Our experiments confirm that reachability analyses are valuable for test generation. Contributions. In summary, we make the following contributions:


## 2 Testing with Verifiers

The basic idea behind testing with verifiers is to derive test cases from counterexamples [5,61]. Thus, meeting a test goal during verification has to trigger a specification violation. First, we remind the reader of some basic notations.

<sup>1</sup> We choose the best two tools VeriFuzz and Klee from the international competition on software testing (Test-Comp 2019) [4]. https://test-comp.sosy-lab.org/2019/

<sup>2</sup> https://www.sosy-lab.org/research/coop-testgen/

Programs. Following literature [9], we represent programs by control-flow automata (CFAs). A CFA P = (L, -<sup>0</sup>, G) consists of a set L of program locations (the program-counter values), an initial program location -<sup>0</sup> ∈ L, and a set of control-flow edges G ⊆ L×Ops×L. The set Ops describes all possible operations, e.g., assume statements (resulting from conditions in if or while statements) and assignments. For the program semantics, we rely on an operational semantics, which we do not further specify.

Abstract Reachability Graph (ARG). ARGs record the work done by reachability analyses. An ARG is constructed for a program P = (L, -<sup>0</sup>, G) and stores (a) the abstract state space that has been explored so far, (b) which abstract states must still be explored, and (c) what abstraction level (tracked variables, considered predicates, etc.) is used. Technically, an ARG is a five-tuple (N, *succ*, *root*, F, π) that consists of a set N of abstract states, a special node *root* ∈ N that represents the initial states of program P, a relation *succ* ⊆ N × G × N that records already explored successor relations, a set F ⊆ N of frontier nodes, which remembers all nodes that have not been fully explored, and a precision π describing the abstraction level. Every ARG must ensure that a node n is either contained in F or completely explored, i.e., all abstract successors have been explored. We use ARGs for information exchange between reachability analyses.

Test Goals. In this paper, we are interested

in structural coverage, e.g., branch coverage. Transferred to our notion of programs, this means that our test goals are a subset of the program's control-flow edges. For using a verifier to generate tests, we have to encode

the test goals as a specification violation. Figure 2 shows a possible encoding, which uses a protocol automaton. Whenever a test goal is executed, the automaton transits from the initial, safe state q<sup>0</sup> to the accepting state q*e*, which marks a property violation. Note that reachability analyses, which we consider for test generation, can easily monitor such specifications during exploration.

Now, we have everything at hand to describe how reachability analyses generate tests. Algorithm 1 shows the test-generation process. The algorithm gets as input a program, a set of test goals, and a time limit for test generation. For cooperative test generation, we need to guide state-space explorations. To this end, we also provide an initial ARG and a condition. A condition is a concept known from conditional model checking [10] and describes which parts of the state space have already been explored by other verifiers. A verifier, e.g., a reachability analysis, can use a condition to ignore the already explored parts of the state space. Verifiers that do not understand conditions can safely ignore them.

At the beginning, Alg. 1 sets up the data structures for the test suite and the set of covered goals. To set up the specification, it follows the idea of Fig. 2. As long as not all test goals are covered, there exist abstract states that must be explored, and the time limit has not elapsed, the algorithm tries to generate new tests. Therefore, it resumes the exploration of the current ARG [5] taking into

$$g \notin \operatorname{gals} \boxplus \bigotimes \underbrace{g \in \operatorname{gals}}\_{\stackrel{\quad}{\rightarrow} \bigotimes \stackrel{\quad}{\rightarrow}} \circ\_{\stackrel{\quad}{\rightarrow}}^{\stackrel{\quad}{\rightarrow}}$$

Fig. 2. Encoding test goals as specification violation

```
Algorithm 1. Generating tests with a (conditional) reachability analysis
```

```
Input: prog = (L, -
                  0, G), goals ⊆ G, limit ∈ N, arg =(N,succ, root, F, π),
       condition ψ
Output: generated test_suite, covered goals, updated arg
1: test_suite=∅; covered=∅;
2: ϕ=generate_specification(goals);
3: while (goals = ∅ and arg.F = ∅ and elapsed_time<limit) do
4: arg = explore(prog, ϕ, arg, ψ, limit − elapsed_time);
5: if (arg.F = ∅ and elapsed_time<limit) then
6: τ = extract_counterexample_trace(arg);
7: test_suite = test_suite ∪ generate_test_from_trace(τ );
8: goals = goals\{last_edge(τ )}; covered = covered ∪ {last_edge(τ )}
9: ϕ=generate_specification(goals);
10: return (test_suite, covered, arg);
```
account program prog, specification ϕ, and (if understood) the condition ψ. If the exploration stops, then it returns an updated ARG. Exploration stops due to one of three reasons: (1) the state space is explored completely (F = ∅), (2) the time limit is reached, or (3) a counterexample has been found.<sup>3</sup> In the latter case, a new test is generated. First, a counterexample trace is extracted from the ARG. The trace describes a path through the ARG that starts at the root and its last edge is a test goal (the reason for the specification violation). Next, a test is constructed from the path and added to the test suite. Basically, the path is converted into a formula and a satisfying assignment<sup>4</sup> is used as the test case. For the details, we refer the reader to the work that defined the method [5]. Additionally, the covered goal (last edge on the counterexample path) is removed from the set of open test goals and added to the set of covered goals. Finally, the specification is updated to no longer consider the covered goal. When the algorithm finishes, it returns the generated test suite, the set of covered goals and the last ARG considered. The ARG is returned to enable cooperation.

## 3 CoVeriTest

The previous section described how to use a single reachability analysis to produce tests for covering a set of test goals. Due to different strengths and weaknesses, some test goals are harder to cover for one analysis than for another. To

<sup>3</sup> We assume that an exploration is only complete if no counterexample exists.

<sup>4</sup> We assume that only feasible counterexamples are contained and infeasible counterexamples were eliminated by the reachability analysis during exploration.

```
Algorithm 2. CoVeriTest: alternating reachability analyses to generate tests
```

```
Input: prog = (L, -
                  0, G), goals ⊆ G, total_limit ∈ N, configs ∈ (analysis × N)
                                                                         +
Output: test_suite
1: test_suite=∅; args=; current=0;
2: while (goals = ∅ and elapsed_time<total_limit) do
3: analysis = configs[current].first; limit = configs[current].second;
4: (arg,ψ) = cooperateAndInit(prog, args, configs.length);
5: (tests, covered, arg) = analysis(prog, goals, limit, arg, ψ);
6: test_suite=test_suite ∪ tests; goals=goals\covered; args=args ◦arg;
7: if (arg.F=∅) then
8: return test_suite;
9: current = (current+1) % configs.length;
10: return test_suite;
```
maximize the number of covered goals, different analyses should be combined. In CoVeriTest, we rotate analyses for test generation. Thus, we avoid that analyses try to cover the same goal in parallel and we do not need to know in advance which analysis can cover which goals. Moreover, analyses that get stuck trying to cover goals that other analyses handle later, get a chance to recover. Additionally, CoVeriTest supports cooperation among analyses. More concrete: analyses may extract and use information from ARGs constructed by previous analysis runs.

Algorithm 2 describes the CoVeriTest workflow. It gets four inputs. Program, test goals, and time limit are already known from Alg. 1 (test generation with a single analysis). Additionally, CoVeriTest gets a sequence of configurations, namely pairs of reachability analysis and time limit. The time limit accompanied with the analysis restricts the runtime of the respective analysis per call (see line 5). In contrast to Alg. 1, CoVeriTest does not get an ARG or condition. To enable cooperation between analyses, CoVeriTest constructs these two elements individually for each analysis run. During construction, it may extract and use information from results of previous analysis runs.

After initializing the test suite and the data structure to store analysis results (args), CoVeriTest repeatedly iterates over the configurations. It starts with the first pair in the sequence and finishes iterating when its time limit exceeded or all goals are covered. In each iteration, CoVeriTest first extracts the analysis to execute and its accompanied time limit (line 3). Then, it constructs the remaining inputs of the analysis: ARG and condition. Details regarding the construction are explained later in Alg. 3. Next,CoVeriTest executes the current analysis with the given program, the remaining test goals, the accompanied time limit, and the constructed ARG and condition. When the analysis has finished, CoVeriTest adds the returned tests to its test suite, removes all test goals covered by the analysis run from the set of goals, and stores the analysis result for cooperation (concatenates arg to the sequence of ARGs). If the analysis finished its exploration (arg.F=∅), any remaining test goal should be unreachable and Algorithm 3. cooperateAndInit: set up start point for analysis exploration, possibly transferring knowledge from previous analysis runs

```
Input: prog = (L, -
                  0, G), args ∈ (arg)
                                    +, numAnalyses ∈ N
Output: ARG for program prog, condition describing explored state space
1: ψ=false; π = ∅; root = (-
                                0, 
                                   );
2: if (length(args)≥numAnalyses) then
3: if (reuse-arg) then
4: return (last_arg_of_analysis(numAnalyses, args), ψ);
5: if (reuse-precision) then
6: π = last_arg_of_analysis(numAnalyses, args).π;
7: if (use-condition ∧ length(args)>0) then
8: ψ = extract_condition(args[length(args)-1]);
9: return (({root}, ∅, root, {root}, π), ψ);
```
CoVeriTest returns its test suite. Otherwise, CoVeriTest determines how to continue in the next iteration (i.e., which configuration to consider). At the end of all iterations, CoVeriTest returns its generated test suite.

Next, we explain how to construct the ARG and the condition input for an analysis. The ARG describes the level of abstraction and where to continue exploration while the condition describes which parts of the state space have already been explored. Both guide the exploration of an analysis, which makes them well-suited for cooperation. While there are plenty of possibilities for cooperation, we currently only support three basic options: continue exploration of the previous ARG of the analysis (reuse-arg), reuse the analysis' abstraction level (reuse-precision), and restrict the exploration to the state space left out by the previous analysis (use-condition). The first two options only ensure that an analysis does not loose too much information due to switching. The last option, which is inspired by abstraction-driven concolic execution [32], indeed realizes cooperation between different analyses. Note that the last two options can also be combined.<sup>5</sup> If all options are turned off, no information will be exchanged.

Algorithm 3 shows the cooperative initialization of ARG and condition discussed above. It gets three inputs: the program, a sequence of args needed to realize cooperation, and the number of analyses used. At the beginning, it initializes the ARG components and the condition assuming no cooperation should be done. The condition states that nothing has been explored, the abstraction level becomes the coarsest available, and the ARG root considers the start of all program executions (initial program location and arbitrary variable values). If no cooperation is configured or the ARG required for cooperation is not available (e.g., in the first round), the returned ARG and condition tell the analysis to explore the complete state space from scratch. In all other cases, the analysis will be guided by information obtained in previous iterations. Option reuse-arg

<sup>5</sup> In contrast, the options reuse-arg and use-conditions cannot be combined because they are incompatible. The existing ARG does not fit to the constructed condition. Since reuse-arg subsumes reuse-precision, a combination makes no sense.

looks up the last ARG of the analysis stored in args. Reuse-precision considers the same ARG as reuse-arg, but only provides the ARG's precision π. For use-condition, a condition is constructed from the last ARG in args. For the details of the condition construction, we refer to conditional model checking [10].

Next, we study the effectiveness of different CoVeriTest configurations and compare CoVeriTest with existing test-generation tools.

## 4 Evaluation

We systematically evaluate CoVeriTest along the following claims:

Claim 1. For analyses that discard their own results from previous iterations (i.e., reuse-arg and reuse-precision turned off), CoVeriTest achieves higher coverage if switches between analyses happen rarely. *Evaluation Plan:* We look at CoVeriTest configurations in which analyses discard their own, previous results and compare the number of covered test goals reported by configurations that only differ in the analyses' time limits.

Claim 2. For analyses that reuse knowledge from their own, previous execution (i.e., reuse-arg or reuse-precision turned on), CoVeriTest achieves higher coverage if favoring more powerful analyses. *Evaluation Plan:* We look at CoVeriTest configurations in which analyses reuse their own, previous knowledge and compare the number of covered test goals reported by configurations that only differ in the analyses' time limits.

Claim 3. CoVeriTest performs better if analyses reuse knowledge from their own, previous execution (i.e., reuse-arg or reuse-precision turned on). *Evaluation Plan:* From all sets of CoVeriTest configurations that only differ in the analyses' time limits, we select the best and compare these.

Claim 4. Interleaving multiple analyses with CoVeriTest often achieves better results than using only one of the analyses for test generation. *Evaluation Plan:* We compare the number of covered goals reported by the best CoVeriTest configuration with those numbers achieved when running only one analysis of the CoVeriTest configuration for the total time limit.

Claim 5. Interleaving verifiers for test generation is often better than running them in parallel. *Evaluation Plan:* We compare the number of covered goals reported by the best CoVeriTest configuration with the number achieved when running all analyses of the CoVeriTest configuration in parallel.

Claim 6. CoVeriTest complements existing test-generation tools. *Evaluation Plan:* We use the same infrastructure and resources as used by the International Competition on Software Testing (Test-Comp'19)<sup>6</sup> and let the best CoVeriTest configuration construct test suites. These test suites are executed by the Test-Comp'19 validator to measure the achieved branch coverage. Then, we compare the coverage achieved by CoVeriTest with the coverage of the best two test-generation tools from Test-Comp'19.

<sup>6</sup> https://test-comp.sosy-lab.org/2019/

#### 4.1 Setup

COVERITEST Configurations. We implemented CoVeriTest in the software analysis framework CPAchecker [15]. Basically, we implemented Algs. 1, 2 and integrated Alg. 3 into Alg. 2. For condition construction, we reuse the code from conditional model checking [10]. For our experiments, we combine value [18] and predicate analysis [16]. Both have been used in cooperative verification [10,11,21].

*Value analysis.* CPAchecker's value analysis [18] tracks the values of variables stored in its current precision explicitly while assuming that the remaining variables may have any possible value. It iteratively increases its precision, i.e., the variables to track, combining counterexample-guided abstraction [28] with path-prefix slicing [22], and refinement selection [21]. Value analysis is efficient if few variable values need to be tracked, but it may get stuck in loops or suffers from a large state space in case variables are assigned many different values.

*Predicate analysis.* CPAchecker's predicate analysis uses predicate abstraction with adjustable-block encoding (ABE) [16]. ABE is configured to abstract at loop heads and uses the strongest postcondition at all remaining locations. To compute the set of predicates—its precision—, it uses counterexample-guided abstraction refinement [28] combined with lazy refinement [43] and interpolation [41]. While the predicate analysis is powerful and often summarizes loops easily, successor computation may require expensive SMT solver calls.

For both analyses, a CoVeriTest configuration specifies how Alg. 3 reuses the ARGs returned by previous analysis runs to set up the initial ARG and condition. In our experiments, we consider the following types of reuses.


Finally, we need to fix the time limit for each analysis. We want to find out whether switches between analyses are important to the CoVeriTest approach. Therefore, we chose four limits (10 s, 50 s, 100 s, 250 s) that are applied to both analyses and trigger switches often, sometimes, or rarely. Additionally, we want to study whether it is advantageous if the time CoVeriTest spends in a round is not equally spread among the analyses. Thus, we come up with two additional time limit pairs: (20 s, 80 s) and (80 s, 20 s).

We combine all nine reuse types with the six time limit pairs, which results in 54 CoVeriTest configurations. All 54 configurations aim at generating tests to cover the assume edges of a program.

Tools. For CoVeriTest, we used the implementation in CPAchecker version 29 347. Moreover, we compare CoVeriTest against the two best tools VeriFuzz [26] and Klee [23] from Test-Comp'19 (in the versions submitted to Test-Comp'19<sup>7</sup>). The tool VeriFuzz is based on the evolutionary fuzzer AFL and uses verification techniques to compute initial input values and parameters for AFL. Klee applies symbolic execution. To compare CoVeriTest against Klee and VeriFuzz, we use the validator TBF Test-Suite Validator v1.2<sup>8</sup> to measure branch coverage. TBF Test-Suite Validator is based on gcov<sup>9</sup>.

Programs. CoVeriTest, Klee, and VeriFuzz produce tests for C programs. All three tools participated in TestComp'19. Thus, for comparison of the three tools, we consider all 1 720 tasks of the TestComp'19 benchmark set<sup>10</sup> that support the branch-coverage property. Since we do not need to execute tests for the comparison of the different CoVeriTest configurations, we evaluated them on a larger benchmark set, which contains all 6 703 C programs from the well-established SV-benchmark set<sup>11</sup> in the version tagged svcomp18.

Computing Resources. We run our experiments on machines with 33 GB of memory and an Intel Xeon E3-1230 v5 CPU with 8 processing units and a frequency of 3.4 GHz. The underlying operating system is Ubuntu 18.04 with Linux kernel 4.15. As in TestComp'19, for test generation we grant each run a maximum of 8 processing units, 15 min of CPU time, and 15 GB of memory, and for test-suite execution (required to compare against Klee and VeriFuzz), the TBF Test-Suite Validator is granted 2 processing units, 3 h of CPU time, and 7 GB of memory per run. We use BenchExec [20] to enforce the limits of a run.

Availability. Our experimental data are available online<sup>12</sup> [13].

<sup>7</sup> https://gitlab.com/sosy-lab/test-comp/archives-2019/tree/testcomp19/2019

<sup>8</sup> https://gitlab.com/sosy-lab/test-comp/archives-2019/blob/testcomp19/2019/ tbf-testsuite-validator.zip

<sup>9</sup> https://gcc.gnu.org/onlinedocs/gcc/Gcov.html

<sup>10</sup> https://github.com/sosy-lab/sv-benchmarks/tree/testcomp19

<sup>11</sup> https://github.com/sosy-lab/sv-benchmarks

<sup>12</sup> https://www.sosy-lab.org/research/coop-testgen/

Fig. 3. Comparing relative coverage (number of covered goals divided by maximal number of covered goals) achieved by CoVeriTest configurations with different time limits. All configurations let analyses discard their own knowledge gained in previous executions.

#### 4.2 Experiments

Claim 1 (Reduce switching when discarding own results). Four types of reuse (namely, plain, condv, condp, and condv*,*p) let the analyses discard their own knowledge from their previous executions. For each of these types, we compare the coverage achieved by all six CoVeriTest configurations that use this type<sup>13</sup>. More concrete, for all six CoVeriTest configurations applying the same reuse type, we first compute for each program the maximum over the number of covered goals achieved by each of these six configurations for that program. Then, for each of the six CoVeriTest configurations that use that reuse type, we divide the number of covered goals achieved for a program by the respective maximum computed. We call this measure *relative coverage* because the value is relative to the maximum and not the total number of goals. Figure 3 shows box plots per reuse type. The box plots show the distribution of the relative coverage. The closer the bottom border of a box is to value one, the higher coverage is achieved. For all four reuse types, the fourth box plot has the bottom border closest to value one. Since the fourth box plot is a configuration that grants each analysis 250 s per round (highest limit considered, only three switches), the claim holds. Claim 2 (Favor powerful analysis when reusing own results). Five types of reuse (namely, reuse-prec, reuse-arg, condv+r, condp+r, and condv*,*p+r) let analyses reuse knowledge from their own, previous execution. Similar to the previous claim, we compute for each of these types the relative coverage of all six configurations using this particular type of reuse. For each reuse type,

<sup>13</sup> Note that those six configurations only differ in the analyses' time limits.

Fig. 4. Comparing relative coverage (number of covered goals divided by maximal number of covered goals) achieved by CoVeriTest configurations when using different time limits and a fixed reuse type. All considered configurations let analyses reuse knowledge from their own, previous execution.

Fig. 4 shows box plots of the distributions of the relative coverage. As before, a bottom border closer to value one reflects higher coverage. In all five cases, the last box plot has the bottom border closest to value one. The last box plots represent CoVeriTest configurations that grant the value analysis 20 s and the predicate analysis 80 s in each round. Since the predicate analysis, which gets more time per round, is more powerful than the value analysis, our claim is valid.<sup>14</sup>

Claim 3 (Better reuse own results). So far, we know how to configure time limits. Now, we want to find out how to reuse information from previous analysis runs. For each reuse type, we select from the six available configurations the configuration that performed best. Again, we use the relative coverage to compare the resulting nine configurations. Figure 5 shows box plots of the distributions of the relative coverage. The first four box plots show configurations in which analyses discard their own results, while the last five box plots refer to configurations in which analyses reuse knowledge from their own, previous executions. Since the last five boxes are smaller than the first four and their bottom borders are closer to one, the last five configurations achieve higher coverage. Hence, our claim holds. Moreover, from Fig. 5 we conclude that it is best to reuse the ARG (although condv+r and condp+r are close by).

Claim 4 (Interleave multiple analyses rather than use one of them). To evaluate whether CoVeriTest benefits from interleaving, we compare CoVeriTest against the analyses used by it. CoVeriTest interleaves value and predicate analysis. Figure 6(a) and 6(b) show scatter plots that compare for each program the coverage, i.e., number of covered goals divided by number of total goals, achieved by the best CoVeriTest configuration (x-axis) with the coverage achieved when only using either value or predicate analysis for test generation. Note that we excluded those programs from the scatter plots, for which we miss

<sup>14</sup> This insight is independently partially backed by a sequential combination of explicitvalue analysis and predicate analysis that performed well in SV-COMP 2013 [62].

Fig. 5. Comparing relative coverage achieved by CoVeriTest configurations applying different strategies to reuse information gained by previous verifier runs.

Fig. 6. Compares the coverage achieved by CoVeriTest (best configuration) with the coverage achieved when running CoVeriTest's analyses alone or in parallel

the number of covered goals for at least one test generator, e.g., due to timeout of the analysis. Figure 6(a) compares CoVeriTest and value analysis; we see that almost all points are in the lower right half. Thus, CoVeriTest typically achieves higher coverage than value analysis alone. Figure 6(b), comparing CoVeriTest with predicate analysis, is more diverse. About 54% of the points are on the diagonal, i.e., CoVeriTest and predicate analysis cover the same number of goals. The upper left half contains 19% of the points, i.e., predicate analysis alone achieves higher coverage. These points for example reflect float programs and ECA programs without arithmetic computations. In contrast, CoVeriTest achieves higher coverage in 27% of the programs. CoVeriTest is beneficial for programs that only need few variable values to trigger the branches, like ssh programs or programs from the product-lines subcategory. CoVeriTest also profits from the value analysis when considering ECA programs with arithmetic computations, since the variables have a fixed value in each loop iteration. All in all, CoVeriTest performs slightly better than predicate analysis alone.

Claim 5 (Interleave rather than parallelize). Figure 6(c) shows a scatter plot that compares for each program the coverage achieved by CoVeriTest (x-axis) and a test generator that runs the value analysis and the predicate analysis in parallel<sup>15</sup>. As before, we exclude programs for which

<sup>15</sup> The test generator uses CPAchecker's parallel algorithm and lets the two analyses share information about covered test goals.

Fig. 7. Compares the branch coverage achieved by CoVeriTest (best configuration) with the branch coverage achieved by existing state-of-the-art test-generation tools

we could not get the number of covered goals for at least one of the analyses. Looking at Fig. 6(c), we observe that many points (60%) are on the diagonal, i.e., the achieved coverage is identical. Moreover, CoVeriTest performs better for 30% (lower right half), while approximately 10% of the points are in the upper left half. Since CoVeriTest achieves the same or better coverage results in about 90% of the cases, it should be preferred over parallelization. This is no surprise since we showed that a test generator should favor the more powerful analysis (which CoVeriTest does, but parallelization evenly distributes CPU time).

Claim 6 (COVERITEST complementary). Our goal is to compare CoVeriTest and the two best tools of Test-Comp'19 [4]: VeriFuzz and Klee. All three tools aim at constructing test suites with high branch coverage. Thus, we use branch coverage as comparison criterion. We measure branch coverage with TBF Test-Suite Validator. Figure 7 shows two scatter plots. Each plot compares branch coverage achieved by CoVeriTest and by one of the other techniques.<sup>16</sup> Points in the lower right half indicate that CoVeriTest achieved higher coverage. Looking at the two scatter plots, we observe that there exist programs for which CoVeriTest performs better and vice versa. Generally, we observed that CoVeriTest has problems with array tasks and ECA tasks. We already know from verification that CPAchecker sometimes lacks refinement support for array tasks. Moreover, the problem with the ECA tasks is that CPAchecker splits conditions with conjunctions or disjunctions—which ECA tasks contain a lot into multiple assume edges. Thus, the number of test goals is much larger than the actual branches to be covered. However, CoVeriTest seems to benefit from splitting for some of the float tasks. Additionally, CoVeriTest is often better on tasks of the sequentialized subcategory. We think that CoVeriTest benefits from the value analysis since the tasks of the sequentialized subcategory contain lots of branch conditions checking for a specific value or interpreting variable values as booleans. All in all, CoVeriTest is not always best, but is also not dominated. Thus, CoVeriTest complements the existing approaches.

<sup>16</sup> Note that the scatter plots only contain points that have a positive x and y value because there exist different reasons (timeout, out of memory, tool failure, etc.) why we might get no or a zero coverage value from the test validator. The plots contain points for about 98% of the 1 720 programs.

#### 4.3 Threats to Validity

All our CoVeriTest configurations consider the same two analyses. Our results might not apply if using CoVeriTest with a different set of analyses. In our experiments, we used benchmark programs instead of real-world applications. Although the benchmark set is diverse and well-established, our results may not carry over into practice.

The validator TBF Test-Suite Validator might contain bugs that result in wrong coverage numbers. However, the validator was used in Test-Comp'19 already, and is based on the well-established coverage-measurement tool gcov.

For the comparison of the CoVeriTest configurations as well as the comparison of CoVeriTest with the single analyses and the parallel approach, we relied on the number of covered goals reported by CoVeriTest. Invalid counterexamples could be used to cover test goals. The analyses used by CoVeriTest apply CEGAR approaches and should detect spurious counterexamples. Moreover, these analyses run in the SV-COMP configuration of CPAchecker and are tuned to not report false results. Another problem is that whenever CPAchecker does not output statistics (due to timeout, out of memory, etc.), we use the last number of covered goals reported in the log. However, this might be an underapproximation of the number of covered goals. All these problems do not occur in the comparison of CoVeriTest with Klee and VeriFuzz, in which the coverage is measured by the validator. Thus, this comparison still supports the value of CoVeriTest.

## 5 Related Work

CoVeriTest interleaves reachability analyses to construct tests for C programs. To enable cooperation, CoVeriTest extracts information from ARGs constructed by previous analysis runs.

A few tools use reachability analyses for test generation. Blast [5] considers a target predicate p and generates a test for each program location that can be reached with a state fulfilling the predicate p. For test generation, Blast uses predicate abstraction. FShell [44–46] and CPA/Tiger [12] generate tests for a coverage criterion specified in the FShell query language (FQL) [46]. Both transform the FQL specification into a set of test-goal automata and check for each automaton whether its final state can be reached. FShell uses CBMC to answer those reachability queries and CPA/Tiger uses predicate abstraction.

Various combinations have been proposed for verification [2,10,11,14,25,27, 29–31,35,37,40,50,64] and test-suite generation [1,32,34,36,38,47,51,54,56,59, 60,63]. We focus on combinations that interleave approaches. SYNERGY [40] and DASH [2] alternate test generation and proof construction to (dis)prove a property. Similarly, SMASH [37] combines underapproximation with overapproximation. Interleaving is also used in test generation. Hybrid concolic testing [54] interleaves random testing with symbolic execution. When random testing gets stuck, symbolic execution is started from the current state. As soon as a new goal is covered, symbolic execution hands over to random testing providing the values used to cover the goal. Similarly, Driller [60] and Badger [56] combine fuzzing with concolic execution. However, they only exchange inputs. Xu et al. [51,63] interleave different approaches to augment test suites. The approach closest to CoVeriTest is abstraction-driven concolic testing [32]. Abstraction-driven concolic testing interleaves concolic execution and predicate analysis. Furthermore, it uses conditions extracted from the ARGs generated by the predicate analysis to direct the concolic execution towards feasible paths. Abstraction-driven concolic testing can be seen as one particular configuration of CoVeriTest.

Also, ARG information has been reused in different contexts. Precision reuse [19] uses the precision determined in a previous analysis run to reverify a modified program. Similarly, extreme model checking [42] adapts an ARG constructed in a previous analysis to fit to the modified program. CPA/Tiger [12] transforms an ARG that was constructed for one test goal such that it fits to a new test goal. Lazy abstraction refinement [43] adapts an ARG to continue exploration after abstraction refinement. Configurable program certification [48,49] constructs a certificate from an ARG, which can be used to reverify a program. Similarly, reachability tools like CPAchecker construct witnesses [6,7] from ARGs. Conditional model checking [10,14] constructs a condition from an ARG when a verifier gives up. The condition describes the remaining verification task and is used by a subsequent verifier to restrict its exploration.

## 6 Conclusion

Testing is a standard technique for software quality assurance. But state-ofthe-art techniques still miss many bugs that involve sophisticated branching conditions [17]. It turns out that techniques performing abstract reachability analyses are well-suited for this task. They simply need to check the reachability of every branch and generate a test for each positive check. However, in practice, for every such technique there exist reachability queries on which the technique is inefficient or fails [8]. We propose CoVeriTest to overcome these practical limitations. CoVeriTest interleaves different reachability analyses for test generation. We experimented with various configurations of CoVeriTest, which vary in the time limits of the analyses and the type of information exchanged between different analysis runs. CoVeriTest works best when each analysis resumes its exploration, different analyses only share test goals, and more powerful analyses get larger time budgets. Moreover, a comparison of CoVeriTest with (a) the reachability analyses used by CoVeriTest and (b) state-of-the-art test-generation tools witness the benefits of the new CoVeriTest approach.

CoVeriTest participated in Test-Comp 2019 [4] and achieved rank 3 (out of 9) in both categories, bug finding and branch coverage.<sup>17</sup>

In future, we plan to integrate further analyses, e.g., bounded model checking or symbolic execution, into CoVeriTest and to evaluate CoVeriTest on real-world applications.

<sup>17</sup> https://test-comp.sosy-lab.org/2019/results/

## References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Pardis**: Priority Aware Test Case Reduction**

Golnaz Gharachorlu(B) and Nick Sumner

Simon Fraser University, Burnaby, BC, Canada *{*ggharach,wsumner*}*@sfu.ca

**Abstract.** Test cases play an important role in testing and debugging software. Smaller tests are easier to understand and use for these tasks. Given a test that demonstrates a bug, *test case reduction* finds a smaller variant of the test case that exhibits the same bug. Classically, one of the challenges for test case reduction is that the process is slow, often taking hours. For hierarchically structured inputs like source code, the state of the art is Perses, a recent grammar aware and queue driven approach for test case reduction. Perses traverses nodes in the abstract syntax tree (AST) of a program (test case) based on a priority order and tries to reduce them while preserving syntactic validity.

In this paper, we show that Perses' reduction strategy suffers from *priority inversion*, where significant time may be spent trying to perform reduction operations on lower priority portions of the AST. We show that this adversely affects the reduction speed. We propose Pardis, a technique for priority aware test case reduction that avoids priority inversion. We implemented Pardis and evaluated it on the same set of benchmarks used in the Perses evaluation. Our results indicate that compared to Perses, Pardis is able to reduce test cases 1.3x to 7.8x faster and with 46% to 80% fewer queries.

**Keywords:** Test case reduction · Automated debugging · Priority aware reduction

## **1 Introduction**

Test case reduction is a technique that aids in testing and debugging software. When an input for a program causes the program to exhibit a property of interest, like a bug, finding a smaller input that also exhibits the property can help to explain the behavior [1–3]. Given an input I <sup>∈</sup> <sup>I</sup> and an oracle ψ : <sup>I</sup> <sup>→</sup> <sup>B</sup> that performs a test and returns true iff a property holds, test case reduction aims to find a smaller input I such that ψ(I- ) = true. Often, this problem is approached through Delta Debugging (DD), a longstanding and effective algorithm for test case reduction that essentially generalizes binary search [2]. However, for inputs with significant structure, generic DD can perform poorly, requiring significant time and not performing much reduction [3,4]. For compilers in particular, where

the inputs must be valid programs, this has led to specialized techniques like Hierarchical Delta Debugging [3,4], language specific reducers like C-Reduce [5], and most recently to Syntax Guided Program Reduction as seen in Perses [6].

Syntax Guided Program Reduction (SGPR) is the present state of the art for compiler targeted test case reduction. The intuition behind SGPR is that the grammar defining the language of inputs eliminates many invalid sub-inputs from the search space. For example, when an input must adhere to the C programming language [7], removing the return type of a function declaration would not be valid because the C grammar specifies that the return type is required. Such syntactically invalid inputs are removed from the search space by SGPR.

Perses, a form of SGPR, takes as arguments not only a program p and oracle ψ, but also the context free grammar G of valid inputs [6]. It transforms the grammar so that removable parts of the input can be identified by the names of the grammar rules used to parse them. This also normalizes the grammar so that all removable components are expressed through quantifiers in an extended context free grammar [8], i.e. optionality (?) and lists (\*, +). This transformation is illustrated in Fig. 1. Notice, for instance, that the recursive rule BAR denoting a list is transformed (=⇒) into a Kleene-+ quantified list. Individual elements of the list may be removed while preserving syntactic validity. Perses then parses the input of interest into an abstract syntax tree (AST) and traverses the AST while trying to (1) remove optional nodes and (2) perform DD to minimize the children of nodes representing lists. The grammar transformations have the benefit of making many syntactically correct removals easy and efficient to locate.


**Fig. 1.** Overview of Perses grammar transformations for SGPR.

Perses has significantly improved the speed of program reduction. However, it still takes several hours to reduce some inputs. Consider the code in Listing 1.1 along with its AST in Fig. 3. This example is similar to a C program generated by the compiler testing tool CSmith [9]. In this example, Perses first considers the root node with ID 1 of the AST. Since the rule for this node ends in star, it is a list node, and its children are the elements of the list. Thus, Perses applies DD to the list of children for node 1 to minimize the number of children. When such lists are long, significant time can be devoted to this task. We show in Sect. 4 that this can lead to substantial *stalls* in reduction, where no progress is made while a list is being processed. However, most of the children of this node have low *token weight*, the number of tokens beneath a given node that is denoted by w: in Fig. 3. Indeed, greater value would be found by focusing

on just *one* of its children, node 5 , which contains the majority of the input beneath it. By spending greater effort up front on portions of the AST of lesser value, Perses suffers from a form of *priority inversion*. Priority inversion occurs when a low priority task is scheduled instead of a high priority task. In this case, Perses focuses on removing low token weight nodes instead of high token weight nodes. Indeed, Perses may even fail to remove elements that would enable better reduction success overall. In this case, the declarations of foo, S, and d are used within the code beneath node 5 . Thus, those uses need to be eliminated *before* any of the declarations can be removed successfully. In practice, we find that priority inversion has a significant impact on reduction time in SGPR.

To address priority inversion, we have developed *priority aware reduction strategies* for program reduction. By focusing the reduction effort on the nodes of the AST that cover the greatest number of tokens, we prioritize reduction of the most complex parts of the input first. This has multiple important benefits: (1) Dependencies between program elements are more likely to be broken by eliminating the complex uses first. (2) Stalls in reduction from unsuccessful rounds of DD can be mitigated. (3) By removing large portions of an input earlier on, each oracle query to ψ can take less time because smaller inputs tend to be faster to check. We have designed and evaluated a tool, Pardis, that makes use of these techniques and found that it leads to consistent and significant performance improvements over Perses on the Perses benchmarks [6].

In summary, this paper makes the following contributions:


## **2 Background and Motivation**

Consider again the example in Fig. <sup>3</sup> and suppose that the oracle (ψ) checks that this program p should print "Hello World!" on line 24 (marked with <sup>∗</sup>). Thus, the smallest subprogram for which ψ returns true is the main function with the desired print statement.

To search for this smaller input inside the original input, Perses traverses the AST using a priority queue ordered by the token weight. In each trial, the node


**Fig. 2.** One round of removal trials in Perses, Pardis and Pardis Hybrid for the AST in Fig. 3. Numbers are node IDs.

**Fig. 3.** AST of the program in Listing 1.1. <sup>w</sup> denotes the token weight of each node.

with the maximum weight is removed from the work queue and traversed. In our example, the queue starts out containing only the root of the AST, node 1 . Perses performs specific reduction operations on different types of nodes during traversal. For instance, on optional nodes, Perses tries to remove the optional child node. For list nodes, Perses minimizes the list of children using DD. Any

*remaining* children of the traversed node are then added to the priority queue in order to be traversed in the future.

Observe that in this example, Perses will first examine node 1 and remove it from the queue. Because 1 is a list node, DD is applied to the children of 1 . Different combinations of children are removed from 1 and the result is checked by ψ to find a smaller input. First, all children are removed and ψ is checked. After this fails, the first half of the children ( <sup>2</sup> and <sup>3</sup> ) are removed, but <sup>ψ</sup> returns false again because this removes required declarations. Since removing the second half of the children ( 4 and 5 ) also fails, the process continues recursively. First DD tries shrinking the list by *removing* each individual child, and next it tries *only keeping* each individual child. Ultimately none of the trials succeed, so all children are added to the queue, and reduction continues with node 5 . The intervening node 6 is not tested by SGPR because it is not syntactically removable. The next node removed from the work queue is node 7 . This continues until the queue is empty. The precise trials exercised in this process are illustrated in Fig. 2(a). Note that 16 steps elapse until a successful trial occurs.

While the priorities used by Perses are controlled by the token weight, they determine how the *children* of the traversed nodes are removed. Thus, any node whose *parent* in the AST is a list is given the same priority as all other elements in the list. This is because DD recursively tries to minimize the entire list until no single element can be removed, regardless of the priorities of individual list elements. As a result, Perses must employ DD on the entirety of the children of 1 even though it would be more beneficial to focus on just one child, node 5 .

Instead, Pardis more directly models the priorities. We note that in an optional or list node, such as 1 , each child may be removed in a syntactically valid fashion. We call such removable nodes *nullable*. When traversing a nullable node in the AST, we can simply try directly to remove it, adding its children if the removal fails. For instance, in the running example, we would visit 1 first. Because 1 cannot get removed, we would simply add its children to the priority queue. Note that all children of 1 are nullable, but 5 has the highest *token weight*. Thus, we next select 5 to traverse but removing 5 also fails. From the given token weights, we next traverse 6 , which is syntactically not removable, and then 7 , which we attempt to remove but is unsuccessful. Next 11 is visited and successfully removed. Removing 11 *enables the removal of* 4 , 3 *and* 2 . Thus, they are removed in a single pass of the tree using Pardis, whereas Perses would require multiple traversals of the AST to remove them. This process continues until the desired output is achieved. As seen in Fig. 2(b), just 4 steps elapse until the first successful trial removes node 11.

Note that in this example, Pardis is able to reduce to the desired output in a *single pass*, while Perses requires multiple passes of the AST. In practice, all program reduction techniques continue until a fixed point is reached, including Pardis, however Pardis can achieve greater reduction in a single traversal of the AST, accelerating convergence on the fixed point.

This priority aware approach can still have drawbacks, however. After focusing on the highest priority nodes, there may be many lower priority nodes remaining. For example, there are multiple remaining nodes of weight 7 in the tree after

performing the reduction by Pardis as described above. We also show experimentally that these lower priority nodes occur in practice in Sect. 5. The above approach of Pardis considers each node *one at a time*, which can have poor performance when reducing such long lists. In addition, we thus propose a *hybrid* approach that still prioritizes nodes by maximum token weight but also uses a list based reduction technique for spans of nodes that have *the same* token weight. This hybrid approach is able to achieve the benefits of being priority aware while still avoiding the cost of considering each node of the AST individually.

Section 3 presents the algorithms behind these techniques in detail.

## **3 Approach**

Recall that the core of Pardis, similar to Perses, maintains a priority queue of the nodes in an AST and traverses the nodes in order to process them. It also makes use of Perses Normal Form, the result of the grammar transformations that Perses introduced [6]. The key difference is that instead of using the token weight of a parent node to determine when its nullable children may be removed, Pardis identifies all nullable nodes (see Sect. 3.2) and uses their token weights directly to prioritize the search. The core algorithm for this process is quite straightforward and presented in Algorithm 1.


Line 1 of the algorithm constructs the priority queue (a max-heap), initializing it with the root of the AST and using a parameterizable priority ρ. ρ is simply a function that takes a node and returns its priority as a tuple. The priority queue selects the element with a lexicographically maximal priority, so ties on the *first* element of the priority tuple are broken by the *second* element and so on. As seen in Fig. 4, for Pardis, <sup>ρ</sup>Pardis returns a pair of numbers, the token weight of the node and the position of the node in a decreasing, right-toleft, breadth first search. The specific breadth first order means that for an AST with n nodes, bfsOrder(p.root)=n, the last child c of p.root has bfsOrder(c)=n-1, and so on. Thus, if several nodes have the same token weight, the one highest in the AST and furthest to the right is selected next. This ordering decreases the chances of trying to remove a declaration before its uses [10].

Line 2 starts the core of the algorithm. While there are more nodes to explore in the queue, the node with the next highest priority is considered. If it is nullable and can be successfully removed, we remove it from the AST, otherwise we add its children to the queue so that they will also be traversed.

While the algorithm is surprisingly simple, we have found it to perform significantly better than the state of the art in practice. As we explore in Sect. 4.2, this results from prioritizing the search toward those portions of the input where reduction can have the greatest impact. To more closely compare with Perses, consider a version of Perses that upon visiting a list or optional node only tries removing each child of that node once<sup>1</sup>. This "one node at a time" variant of Perses can also be implemented using Algorithm 1 by carefully choosing the priority formula ρ. Because Perses considers removing the *children* of the nodes it traverses, it actually prioritizes the work queue using the token weight of the *parent* rather than the token weight of nullable nodes being considered for removal. This leads to the alternative prioritizer <sup>ρ</sup>*perses* presented in Fig. 4. Observe that all children of a list node receive the same token weight, that of the entire list. This can inflate the priority of some nodes in the work queue and leads to poor performance.


**Fig. 4.** Prioritizers used for Pardis, node at a time Perses, and Pardis Hybrid.

Like other program reduction algorithms [3,5,6,11,12], Algorithm 1 is used to compute a fixed point. That is, in practice the algorithm is repeated until no further reductions can be made. As in prior work, we omit this from our presentation for clarity. In theory, this means that the worst case complexity of the technique is O(n<sup>2</sup>) where <sup>n</sup> is the number of nodes in the AST. This arises when only one leaf of the AST is removed in each pass through the algorithm. In practice, most nodes are not syntactically nullable, and we show in Sect. 4.1 that performance of Pardis exceeds the state of the art.

In addition, while we focus on *removing* nodes of the AST, Perses also tries to *replace* non-list and -optional nodes with compatible nodes in their subtrees. We do not focus on this aspect of the algorithm. In practice, we found it to

<sup>1</sup> We compare against *both* versions of Perses in Sect. 4.1.

significantly hurt performance (see Sect. 4.1) and we consider efficient replacement strategies to be orthogonal to and outside the scope of this work.

## **3.1** Pardis Hybrid

The initial priority aware technique from Algorithm 1 can also encounter performance bottlenecks, however. The original motivation for using DD on lists of children in the AST was that its best case behavior is O(log(n)) where n is the number of children in the list. This is because it tries removing multiple children at the same time. Processing one node at a time, however, requires that every list element is considered individually, guaranteeing O(n) time for one round of Algorithm 1. Priority aware reduction that proceeds one node at a time faces a different set of inefficiencies that can still cause stalls in the reduction process.

Thus, we desire a means of removing multiple elements from lists at the same time while *still* preserving priority awareness. In order to achieve this, we developed Pardis Hybrid, as presented in Algorithm 2. This approach uses a modified prioritizer as presented in Fig. 4 that first orders by token weight, then by parent traversal order, then by node traversal order. The effect this has is that all children of the same parent with the same weight are grouped together. As a result, we can remove them from the priority queue together and perform list based reduction (like DD) to more efficiently remove groups of elements in a list that have the same priority (for instance, nodes 9 and 10 get removed as a group in one trial using Pardis Hybrid as shown in Fig. 2(c)). Because the search is still primarily directed by the token weights of the removed nodes, the technique still fully respects the priorities of the removed nodes.

Similar to the previous approach, line 1 of Algorithm 2 starts by creating the priority queue. Note that it specifically uses the prioritizer ρPardis Hybrid, which groups children having the same token weight in the priority queue. As long as there are more nodes to consider, line 3 takes all nodes from the queue with the same weight and parent. If the weight of a node is unique, this simply returns a list of length 1. Line 4 filters out non-nullable nodes from the trial, and line 5 just applies list based reduction to any nullable nodes. Lines 6 and 7 then remove the eliminated nodes from the tree and add the children of remaining nodes to the work queue. Again, this algorithm actually runs to a fixed point.

While the worst case behavior of DD is O(n<sup>2</sup>) [2], this can be improved to O(n) by giving up hard *guarantees* on minimality [13]. Since this reduction process is performed to a fixed point anyway, minimize on line 5 makes use of this O(n) approach to list based reduction (OPDD) without losing 1-minimality. As a result, the theoretical complexity of Pardis Hybrid is the same as Pardis.

#### **3.2 Nullability Pruning**

Finally, we observed that many oracle queries were simply unnecessary. Specifically, recall that a node can be tagged nullable because it is an element of a list or a child of an optional node, as previously defined by Perses grammar transformations [6]. The complete algorithm for this tagging is in *TagNullable* of Algorithm 3. However, for example, a list of one element could contain another list of one element. In the AST, this appears as a chain of nodes, at least two of which are nullable. Removing *any one* of these nodes removes the same tokens from the AST. Thus, it is only necessary to select a single nullable node from any *chain* of nodes, and the others can be disregarded.

We exploit this through an optimization called *nullability pruning*. We traverse every chain of nodes in the AST, preserving the nullability of the highest node in the chain and removing nullability from those below it. The complete algorithm is presented in *PruneNullable* of Algorithm 3. In effect, it is just a depth first search that removes redundant nullability from nodes along the way instantaneously.

In practice, we find that this can statically (ahead of time) prune most of the AST from the search space. Specifically, in the benchmarks we examine in Sect. 4, we find that of 1,593,875 total nullable nodes, 17% are redundant optional nodes and 44% are redundant list element nodes. We observe the impact of this pruning on the actual reduction process in Sect. 4.1.


## **4 Evaluation**

We evaluate Pardis's performance and examine the impact of priority inversion on reduction by answering the following research questions:


## **4.1 RQ1. Performance:** Pardis **vs. Perses**

**Experimental Set-Up.** We evaluate Pardis on the set of C test cases used in the evaluation of Perses, including the oracle scripts provided by authors of Perses. While using these, we observed that they still allowed for some undefined behavior [5,14], so we updated all oracles to reject test case variants with undefined behavior. As a result, we were able to reproduce bugs for 14 out of 20 original test cases. The remaining benchmarks that could not reproduce their original failures were elided for this study. Since the implementation of Perses' components is not publicly available, we implemented the Perses grammar transformations and reduction based on the algorithms available in the paper [6] using the C++ bindings of ANTLR [15]. All of our implementations have been made available<sup>2</sup>. Our experiments were conducted on an Intel Xeon E5-2630 CPU and 64 GB memory running Ubuntu.

**Variants of Reduction Techniques.** To better explain performance differences, we benchmark several algorithms that each add one difference. All approaches compute fixed points as previously described.


<sup>2</sup> https://github.com/golnazgh/PARDIS.


**Table 1.** Original and reduced test case size and number of oracle queries.

**Reduction Performance.** We compare these techniques in terms of *the number of oracle queries* (Q), *reduction quality* or size of the final reduced test case (R), *reduction time* (T), and *reduction speed* or the average number of tokens removed per second (E). Results are presented in Tables 1 and 2. The best values of queries, time, and speed are highlighted for each test case. As can be seen, in all cases, either Pardis or Pardis Hybrid outperform all variants of Perses. Compared to the full removal-based Perses algorithm (Perses DD), our proposed algorithms reduce **1.3x** to **7.8x** faster and with **46%** to **80%** fewer queries. The results across variants suggest that these benefits arise from priority awareness and nullability pruning. Due to fixed point computation, all approaches produce test


**Table 2.** Reduction time and speed for different variants of reduction techniques.

cases from which no one token can be removed while satisfying ψ (1-minimal) [2], but they can produce different final reduced test cases [2]. On average, Pardis yields reduced test cases with 574 tokens compared to Perses DD with 609 tokens.

In addition, we graphed the reduction progress of each test case for the different variants. Fig. 5 shows the percentage of remaining tokens over time during reduction. For sake of space, we only include graphs for six of the test cases. Note that the y-axis is log scaled. Pardis and Pardis Hybrid show much faster convergence to a reduced test case compared to Perses variants. Recall that the only factor differentiating Perses N from Pardis w/o Pruning is the order in which the queue of nodes is traversed. Unlike Perses N, Pardis w/o Pruning does not suffer from priority inversion and guides the reduction process based on token weights of the nodes to remove. As can be seen, this advantage leads to faster convergence to a reduced test case. We examine the impact of priority inversion on reduction speed more rigorously in Sect. 4.2.

**Replacement.** As mentioned in Sect. 3, Perses also considers a replacement strategy for non-list or -optional nodes in addition to removal for other nodes. For instance, in Fig. 3, Perses will attempt to replace node 6 with node 14 because they both match the same grammar rule (compound stmt). This replacement fails since required declarations will get removed and ψ will return false.

Including replacement significantly increases the work done by reduction. For completeness, we implemented Perses DD with replacement as described in their paper [6] and defined a four-hour timeout for the reduction process. In 11 out of 14 cases, Perses DD with replacement could not finish the reduction process before reaching the timeout. In the remaining three, it generated reduced test cases with the same size or slightly smaller while performing a significantly larger number of oracle queries (more than 3× over Perses DD without replacement).

#### **4.2 RQ2. The Impact of Priority Inversion**

As shown in Fig. 5, avoiding priority inversion leads to faster convergence. One explanation for this is that priority awareness may decrease the amount of work required to remove a token (as seen in the motivating example). We explore this in a case study on gcc-64990 with 148,931 tokens. The *number of removal attempts* for a token is number of times a single token is considered for removal. Removing any ancestor of a token in the AST will remove that token, so if a first attempt fails, a deeper ancestor may be attempted. We compute this for every token of the test case to get a sense of the work required for each token. A better traversal order of the AST should cause fewer overall token removal attempts. To measure only the impact of different traversal orders, we compare Pardis w/o Pruning with Perses N. As described in Sect. 4.1, they follow the exact same reduction rules and differ only in their traversal orders.

Figure 6 depicts histograms of the distributions of token removal attempts for Pardis w/o Pruning and Perses N. For clearer visualization, we show only the distributions for the number of attempts less than or equal to 20. We can see how Perses N distribution is inclined toward a larger number of removal attempts,

**Fig. 5.** Converging to a reduced test case in six variants of reduction techniques.

an indicator of more work required in order to remove individual tokens. In addition, we statically measure that the difference between the removal attempt distributions is significant. We use a one sided Wilcoxon rank-sum test [16] to determine whether the distribution of Perses N is indeed greater than that of Pardis w/o Pruning. The p-value computed for our data was less than 2.2e<sup>−</sup><sup>16</sup> which strongly supports this observation.

**Fig. 6.** Distributions of token removal attempts for Pardis w/o Pruning and Perses N.

## **5 Discussion**

Pardis Hybrid **as a** *sweet spot* **in reducing test cases:** As discussed earlier, unlike Perses, Pardis Hybrid does not suffer from priority inversion because it prioritizes the search primarily on the token weight of nodes being considered for removal. Moreover, unlike Pardis, it does not strictly remove one node at a time and allows the removal of nodes with the same weight and the same parent as a group. Hence, it can be considered a sweet spot in reducing test cases. We conduct two studies that can further explore this idea.

*(1) Oracle Verification Time.* The number of oracle queries is a common metric used in similar studies to reason about reduction efficiency since it directly impacts the total reduction time [2,3,6,13,17]. For instance, both Pardis and Pardis Hybrid perform fewer oracle queries and take less time than Perses. However, the number of oracle queries is not the only factor involved. The time required to run each of these queries, or *oracle verification time*, also affects the total running time. For instance, as presented in Sect. 4.1, Pardis has the smallest number of oracle queries in 12 out of 14 test cases. However, in terms of total reduction time and speed, Pardis Hybrid is the fastest in 8 out of 14 cases, even while performing *more* queries compared to Pardis in 6 of them. Oracle verification time can depend on multiple elements such as the size and complexity of the test case. Since Pardis Hybrid takes advantage of the possibility to remove more than one node at a time, it may try variants of the test case that are smaller and may be faster to verify compared to Pardis. To check this hypothesis, we conducted a case study on gcc-64990 and recorded the running time of each oracle query during reduction. As shown in Tables 1 and 2, Pardis reduces this test case in 932 s with 2,632 queries, and Pardis Hybrid

has a total reduction time of 916 s (16 s shorter) while performing 3,148 oracle queries (516 more queries). Both techniques yield the same final test case.

Figure 7 depicts the distribution of oracle verification times in Pardis and Pardis Hybrid, showing that Pardis has more queries that take longer compared to Pardis Hybrid. The shorter queries in Pardis Hybrid directly decrease its overall reduction time making it reduce test cases with fewer queries compared to Perses and shorter queries compared to Pardis.

**Fig. 7.** Distribution of oracle verification time for Pardis and Pardis Hybrid.

**Fig. 8.** Distribution of token weights of nodes visited during Pardis reduction.

*(2) Distribution of Token Weights.* The motivation behind proposing Pardis Hybrid as discussed in Sect. 3.1 was that if lists in a test case shrink after removing nodes with large unique token weights, applying DD on list elements with the same weight can be beneficial. In fact, the more of the remaining nodes that share token weights, the more beneficial using DD becomes since it provides the opportunity to remove those nodes in just one trial. This can avoid the possibly time-consuming process of visiting nodes one by one. To understand the distribution of token weights in practice, we perform Pardis (the one node at a time removal) on gcc-64990 and record token weights of nodes visited during the removal process. Figure 8 shows the distribution with **5** as the median of token weights of nodes visited during the reduction. The small median motivates the use of Pardis Hybrid in practice since it indicates that half of the nodes have one of only five different token weights and can benefit from the grouped removals.

**Syntactic vs Semantic Validity:** Perses and Pardis discard *syntactically* invalid variants of the test case during reduction. However, there are also *semantically* invalid queries such as removing the declaration of a variable before removing its use. SGPR techniques cannot entirely avoid these queries since they guide the reduction process based on the syntax of the grammar. However, the priority order of Pardis can mitigate this problem. By prioritizing by token weight, it is more likely to visit and remove uses before spending effort on declarations. One reason for this is that a higher token weight tends to mean that there are more uses beneath that node. For instance, in Fig. 3, uses of variables a, b and

c are descendants of node 11 with nodes 8 , 9 and 10 as their declarations. Pardis removes the uses by first removing 11 while Perses tries to remove the declarations first due to priority inversion. Hence, Pardis prunes nodes in one pass of the AST that Perses may require a fixed point mode to remove.

**Threats to Validity:** We evaluated Pardis on the same set of C test cases used in the evaluation of Perses. The implementation of Perses' grammar transformations and reduction is not publicly available, so we reimplemented Perses as described in its paper. Our implementation has been made available to provide a consistent platform for future work. However, the exact implementations, environmental settings and the scripts to check the property of interest can all impact the final results. For instance, the final sizes of the reduced test cases reported for the original Perses' implementation [6] are smaller than those using our reimplemented version of Perses. As discussed in Sect. 4.1, this may be because Perses' oracles allowed for undefined behavior, which can lead through smaller but invalid reduced test cases. To mitigate this problem, we made the oracles strictly prevent undefined behavior for both Pardis and Perses. Note that Pardis significantly outperforms both Perses' original implementation [6] and our reimplementation in terms of number of oracle queries.

While the techniques presented in Pardis are general in ability, our evaluation focuses on C in order to compare with Perses. Further investigation is required to claim that the performance benefits extend to other languages.

## **6 Related Work**

The closest work to this paper is Perses [6]. Unlike Pardis, it suffers from priority inversion that adversely affects the reduction speed. Other generic test case reduction techniques are Delta Debugging (DD) [2], its O(n) variant [13], and Berkeley Delta [18]. These face challenges when reducing hierarchical inputs. Several techniques focus on reducing hierarchically structured test cases [3,4,6,11,12,19,20]. Among these, only Perses is priority aware, in spite of its priority inversion. Indeed, most techniques process the input level by level. Like Pardis, Perses and Simp [20] are notable exceptions in that they can search across levels when deciding how to reduce. However, Simp is specific to SQL Queries. GTR [12] is notable in that it is trained when to apply different reduction operations. Finally, C-Reduce [5] is a tool for reducing C/C++ test cases that requires extensive domain-specific knowledge.

## **7 Conclusions**

We have shown that the prior state of the art for test case reduction suffers from *priority inversion* and that this causes a significant increase in reduction time. We proposed priority aware reduction techniques, Pardis and Pardis Hybrid, that focus reduction effort where they can have the most impact. These techniques can speed reduction by 1.3× to 7.8× over the prior state of the art.

**Acknowledgements.** This research was partially supported by the Natural Sciences and Engineering Research Council of Canada.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Automatically Identifying Sufficient Object Builders from Module APIs**

Pablo Ponzio1,3(B) , Valeria S. Bengolea<sup>1</sup>, Mariano Politano1,3, Nazareno Aguirre1,3, and Marcelo F. Frias2,3

<sup>1</sup> Universidad Nacional de R´ıo Cuarto, R´ıo Cuarto, Argentina {pponzio,vbengolea,mpolitano,naguirre}@dc.exa.unrc.edu.ar <sup>2</sup> Instituto Tecnol´ogico de Buenos Aires (ITBA), Buenos Aires, Argentina mfrias@itba.edu.ar

<sup>3</sup> Consejo Nacional de Investigaciones Cient´ıficas y T´ecnicas (CONICET), Buenos Aires, Argentina

**Abstract.** Various approaches to software analysis (e.g. test input generation, software model checking) require engineers to (manually) identify a subset of a module's methods in order to drive the analysis. Given a module to be analyzed, engineers typically select a subset of its methods to be considered as object builders to define a so-called driver, that will be used to automatically build objects for analysis, e.g., combining them non-deterministically, randomly, etc. This requires a careful inspection of the module and its API, since both the relative exhaustiveness of the analysis (leaving important methods out may systematically avoid generating different objects), as well as its efficiency (the different bounded combinations of methods grows exponentially as the number of methods increases), are affected by the selection.

We propose an approach for automatically selecting a set of builders from a module's API, based on an evolutionary algorithm that favors sets of methods whose combinations lead to producing larger sets of objects. The algorithm also takes into account other characteristics of these sets of methods, trying to prioritize the selection of methods with less and simpler parameters. As the implementation of this evolutionary mechanism requires in principle handling and comparing large sets of objects, and this grows very quickly both in terms of space and running times, we employ an abstraction of sets of objects, called field extensions, that involves using the field values of the objects in the set instead of the actual objects, and enables us to effectively implement our mechanism. An experimental assessment on a benchmark of stateful classes shows that our approach can automatically identify sets of builders that are sufficient (can be used to create any instance of the module) and minimal (do not contain superfluous methods), in a reasonable time.

## **1 Introduction**

As software is becoming more ubiquitous thanks to the rapid advances in technology, guaranteeing the functional correctness of software is more crucial than ever. Thus, a research area of growing importance is that of automated software analysis, whose goal is to assist engineers, through the provision of tools for automated analysis, in finding deficiencies both in software and software related models. Automated test generation [1,11,13,17,24,25,28,29,32], software model checking [9,34,35], and static analyses [6,16], among many others, are prominent approaches in this line of research.

While these techniques involve in many cases fully automated analyses, their application often requires some effort from the engineers. Software model checkers rely on the definition of *drivers*, programs that allow one to build inputs for the code under analysis. Similarly, in parameterized-unit testing approaches [33] a mechanism for building inputs is mandatory. Some symbolic execution based tools require the so-called *"object factories"* to build tests cases involving inputs with non-primitive types [32]. Automated test generation techniques based on a module's API can be used for building inputs for non-primitive types [11,24], thus automating the above-mentioned input-generation issues. But they usually present difficulties in generating a good set of diverse inputs for stateful, complex structures. This is even more difficult for structures with rich APIs [26]. Many authors have addressed this problem by defining different approaches for guiding test generation, to create more diverse sets of inputs [7,26].

In this paper, we take a different approach to address the problem of generating better inputs for stateful modules. We observe that the selection of routines from a module API, to feed an input generation tool so as to build input structures for program analysis (drivers for model checking, input structures for parameterized unit tests, etc.), has a crucial impact on the analysis. We call *builders* a set of routines *B*, drawn from a module's *M* API, that can be employed to create input structures in an automated program analysis for *M* (e.g. a driver for model checking). Clearly, the higher the number of different structures that can be created with *B*, the better the chances to find bugs in *M*. As the number of instances of a software module is potentially infinite, and the program analyses we target are also limited in the number of structures they can employ, we limit ourselves to a bounded-exhaustive set of structures for *M* [4] (e.g. all the instances of a linked list with up to *k* nodes). We denote this set by *BE*(*M, k*). We say that a builders are *sufficient* if they can combined to build all the instances in *BE*(*M, k*). Thus, sufficient builders are the best possible choice for bug finding (in a bounded setting). Notice that *B* can contain superfluous routines. A superfluous routine *s* is such that *BE*(*M, k*) can be built using routines in *B* − {*s*} (the simplest example being routines that never change the state of their parameters). These routines provide no benefits in terms of bug finding capabilities of the analysis. We call *minimal* a set of builders with no superfluous routines. Minimality is important because providing an analysis tool with superfluous routines often negatively impacts its efficiency (the number of ways *k* routines can be combined usually increases exponentially with *k*).

Manually selecting sufficient and minimal builders is not an easy task: it requires a thorough analysis of the available routines and a deep understanding of the program semantics. This is especially hard for programs with rich APIs, where there are many routines and a lot of redundancy in the API (see Sect. 2). We propose an automated approach for identifying such a sufficient and minimal set of builders, based on an evolutionary algorithm that searches for a minimal set of routines that is capable of generating the maximum number of different (bounded) objects (i.e., *BE*(*M, k*)). Moreover, our evolutionary approach also takes into account other characteristics of the builders, such as the number and complexity of their parameters, so that "simpler" routines are favored in the search. The goal is to choose builders that can be more easily and more efficiently used by the subsequent program analyses.

The fitness value for a set of routines *R* is based on the number of bounded structures that can be generated using combinations of these routines. To compute the fitness we use a modified version of a random test case generation tool (Randoop [24]) to generate as many bounded structures as possible from *R*, allowing at most *k* of objects of each type in the structures (a parameter to our algorithm). As sets of objects are very expensive to maintain and manipulate, both in terms of space and running time, we employ an efficient abstraction of a set of objects, called *field extensions*, defined as the set of field values appearing in any of the objects in the set [25]. Thus, instead of counting the number of different objects achieved by a candidate, the fitness function will compute the field extensions as objects are generated, and return the number of field values in the extensions. Intuitively, a higher number of field values in the field extensions means that the builders can be used to construct a more diverse set of objects, and therefore they should be preferred over other sets of builders.

We assess our approach experimentally on a benchmark of stateful Java classes drawn from the literature. The results show that in our case studies our approach identifies sets of routines that are sufficient and minimal, in a reasonable time. We also assess the impact of our approach in an automated analysis, namely, in the generation of test cases for parameterized tests. We compare how the random test case generation tool Randoop behaves when fed with the full module API, against providing the tool with only the builders identified by our approach. The results indicate that in the latter case Randoop generated more (and larger) objects, within a fixed time budget.

## **2 Motivating Example**

In this section, we motivate our approach by means of a running example. The Apache NodeCachingLinkedList (NCL for short) [36] consists of a main circular doubly linked list, that holds the elements of the collection, and a secondary singly linked list that acts as a cache for nodes that have been removed from the main list. Nodes stored in the cache can be reused, and added again to the main list when inserting elements in the main list. Thanks to its cache, in applications where insertions and removals from the list are very frequent, NCL can significantly reduce the overhead needed for memory allocation and garbage collection of nodes. As an illustration, Fig. 1 shows the three NCL instances that can be built with exactly two nodes.

**Fig. 1.** Three NodeCachingLinkedList instances with exactly two nodes



NCL has a very rich API, as shown in Table 1. However, for building any feasible NCL object only a few methods from the API suffice. For example, combinations of the methods in Fig. 1.1, when instantiated with appropriate parameters, can be used to build any desired (finite) NCL object. Thus, the methods therein are an example of a sufficient set of builders. Notice that, after using the constructor, the main list of NCL can be populated just by using the addFirst method. However, if we want to generate instances where the cache is not empty, we can do so through the removeFirst method, as the sufficient set of builders suggests. For most automated analyses, we would like to consider as varying scenarios (inputs) as possible, hence the motivation to build sufficient sets of builders. Furthermore, the builders in Fig. 1.1 are also minimal, since the lack of any one of them would imply that some NCL's objects cannot be constructed anymore with the routines.

```
(0) NodeCachingLinkedList ()
(7) addFirst (Object )
(25) removeFirst ()
```
**Figure 1.1.** A sufficient set of builders for NCL

```
(3) add ( Object )
```

```
(4) add ( int , Object )
```

```
(7) addFirst (Object )
```

```
(8) addLast (Object )
```
**Figure 1.2.** Add variants that can be used to populate NCL's main list

Notice that there can be many sets of sufficient and minimal builders. For example, we get sufficient and minimal builders by replacing addFirst in Fig. 1.1 with any of the other add variants shown in Fig. 1.2, as for any way of filling up NCL's main list with addFirst there exists a different way to build the same object using another add variant (perhaps invoked with different parameters and changing the execution order).

We also observe that the simpler the parameters of a routine, the easier to use the routine is for generating inputs in the context of a program analysis. For instance, among the alternative add routines for NCL (Fig. 1.2), add(int,Object) receives more parameters than the other three methods, therefore it is harder to generate parameters for it when generating inputs. This makes the other three alternatives preferred over it. Thus, our approach takes into account the number of parameters and their complexities for selecting the best possible builders.

Many methods in Table 1 are marked as observers (column Obs?), meaning that they do not modify the objects they operate on, nor they are useful for creating non-primitive objects. Hence, observers are always superfluous, and should never be included in a set of minimal builders. Our approach tries to recognize them beforehand, and discards them from the search to significantly reduce the search space.

To conclude this section we remark that, when fed with the whole NCL's API, our approach automatically identified the sufficient and minimal set of builders for NCL shown in Fig. 1.1.

## **3 Background**

#### **3.1 Field Extensions**

The idea behind field extensions [25] is to define a representation for a set of objects that is smaller in size and easier to manipulate algorithmically. This representation implies some loss of information, but for certain applications (like the one in this paper) they are precise enough to be useful in practice [1,12,25, 26,29].

*head* = (*L*0*, null*)*,* (*L*0*, N*0) *cache* = (*L*0*, null*)*,* (*L*0*, N*1)*,* (*L*0*, N*0) *next* = (*N*0*, N*1)*,* (*N*1*, N*0)*,* (*N*0*, N*0)*,* (*N*1*, null*) *prev* = (*N*0*, N*1)*,* (*N*0*, N*0)*,* (*N*1*, null*)*,* (*N*0*, null*)

**Figure 1.3.** Field extensions for the set of instances in Fig. 1

Given a set *S* of objects, its field extensions representation consist of a set of pairs for each field f, such that (obj,val) belongs to the field extensions of f if obj.f = val (i.e., the value of f for obj equals to val), for some object obj in *S*. As an example, consider the instances displayed in Fig. 1. Its corresponding field extensions are shown in Fig. 1.3. We omit the values stored in the nodes for the sake of clarity. Notice that structure (a) in Fig. 1 can be built using only add methods, whereas for (b) and (c) we have to also employ some kind of remove operation, to move nodes from the main list to the cache. Notice that values (*L*0*, N*0) and (*L*0*, N*1) for the cache field only appear in the field extensions when the structures have nodes in the cache, like (b) and (c). In addition, prev fields of nodes in the cache are always *null*, but prev fields can never be *null* in the main list (due to its circularity). This means that field extensions for structures that have non-empty caches have the potential of having a larger number of values than those for structures with no caches.

It is important to canonicalize structures before computing field extensions [12]. Canonicalization involves assigning unique identifiers *N*0*, N*1*, ...* to each of its nodes during a traversal of the structure (we employ a breadth first traversal), starting at the root. Nodes visited first receive smaller identifiers than those visited afterwards during the traversal. Fields must be visited in a fixed order. Note that structures in Fig. 1 are all in canonical breadth-first form.

#### **3.2 Random Test Case Generation**

Random test generation consists of randomly producing inputs in order to test software [8,21,24]. Random input generation is straightforward when considering basic (numeric) data types, but producing inputs of other more complex types, in particular instances of *stateful* classes, is less obvious and calls for a more complex mechanism, other than just using random number generators. One such mechanism, that has been implemented by various tools for random test generation for object-oriented code, is based on randomly combining method sequences, that produce inputs of different types [8,21,24]. The process associated with the Randoop tool [24] that we use here, works essentially as follows. For every datatype, a set of sequences that produce inputs of such datatype, is maintained. To start with, for basic data types, a set of initial values is considered, and for class types, only null is considered at first (these can be considered test sequences of size one). The procedure to build a new test sequence starts by randomly selecting a method *m*, among all methods in the software under test. For example, one could randomly choose one of the methods for the NCL's API (Table 1), say add(Object). To actually build the test sequence, values for each of the parameters of the method *m*, of the corresponding types, have to be provided. These are obtained by randomly selecting test sequences, from the sets of sequences of the corresponding types, and sequentially composing them, with method *m* as a last statement. As an example, say that a sequence containing only the constructor of NCL is randomly selected, from the available sequences for the NCL type, and for the parameter of add, an Integer with value 0 is randomly chosen. Combining all these sequences together results in:

```
NodeCachingLinkedList l = NodeCachingLinkedList () ;
l . add (new I n t e g e r ( 0 ) ) ;
```
This new sequence can now be stored for later use a as parameter for other methods that operate on NCL objects.

This process is repeated until either a time budget is exhausted, or the desired number of tests (set by the user) is generated. Randoop uses guidance from the execution of tests to avoid generating illegal tests. We refer the interested reader to the article introducing Randoop [24], for further details.

An important issue to remark here is that the execution of each test sequence generated by Randoop produces a number of objects for the given type (NCL in the example). We exploit this characteristic of Randoop to compute the fitness function for a set of methods, although instead of storing actual objects we will maintain field extensions, as we explain in more detail in Sect. 4.

## **4 An Evolutionary Algorithm for Identifying Sufficient Object Builders**

As mentioned before, to find a sufficient set of builders from a program API we design a genetic algorithm, that we describe below. Genetic algorithms [14] are non-exhaustive guided search algorithms, based on a hill climbing strategy [30]. The search space is composed of a generally very large set of individuals (the candidates), and the search objective is to find an individual with sought-for features. As opposed to classic search algorithms, genetic algorithms maintain a set of individuals, called the population, and search progresses by iteratively selecting a number of individuals in the population, using these for evolution (building new individuals out of these), and leaving out some individuals of the whole set (the "old" ones and the "new" ones). Selection of individuals for population evolution, as well as individuals' removal, are guided by a fitness function, the heuristic function used to guide the search. This function applies to individuals, and its result is generalizable to the population too (e.g., the fitness of the population may be taken as the fitness of its "fittest" individual). This function captures the features sought for in the search, and thus can be used as a halting criterion (e.g., algorithm stops after finding an individual with fitness above a certain threshold). Finally, individuals are often called chromosomes, and represented as vectors of genes that capture their characteristics. This idea is strongly related to how new individuals are constructed: by representing candidates as vectors of independent characteristics, one can build new candidates by combining part of the characteristics of an individual with part of the characteristics of another, or by arbitrarily changing a characteristic of a given individual. These two forms of evolution are called crossover and mutation, respectively, and are the traditional mechanism to build new candidates out of existing ones in genetic algorithms. For further details, we refer the reader to [22].

## **4.1 Chromosome Representation**

In the context of our problem, candidate solutions represent sets of methods from the API of the module being analyzed. We then employ vectors of boolean values as chromosome representation. Let *n* be the number of methods in the API; the chromosomes in our algorithm will be vectors of size *n*. For any vector, the *i*-th position is true if and only if the chromosome contains the *i*-th method of the API. For example, there are 34 methods in the NCL's API (Table 1), and we enumerated them from 0 to 33. The sufficient set of builders in Fig. 1.1 is characterized by the vector with positions 0, 7 and 25 set to true, and the remaining positions set to false. In this case, the whole search space consists of the 2<sup>34</sup> possible chromosomes.

## **4.2 Fitness Function**

Given a chromosome representing a set of methods *M*, our fitness function computes an approximation of the number of bounded objects that can be built using combinations of methods in *M*. Chromosomes with higher fitness values are estimated to build more objects than those that have smaller fitness values.

Ideally, we would like to explore all the feasible objects within a small bound *k*, that can be built using the methods of the current chromosome, i.e., *BE*(*M, k*). In other words, we need a bounded exhaustive generator for the set of methods. The bound *k* represents the maximum number of objects that can be created for each class (in Fig. 1, the number of nodes in the NCL objects are bounded by *k* = 2), and the maximum number of primitive values available (for example, integers from 0 to *k* − 1). For this purpose, we developed a prototype modifying the Randoop tool, discussed briefly in Sect. 3.2. First, we altered Randoop to work with a fixed set of primitive values (integers from 0 to *k* − 1). (Normally, Randoop would save primitive values that are returned by the execution of tests, and reuse these values in future tests.) Second, we make Randoop drop sequences of methods that create objects with more than *k* objects (of any type), to stop it from building objects larger than needed. To achieve this, we canonicalize the objects generated by the execution of each sequence, and we discard the sequence if some object has an index equal or larger than *k*. Third, we extend Randoop with "global" field extensions, and when the execution of a sequence terminates all the field values of the objects generated by the sequence are added to the field extensions. For example, if Randoop had generated the objects in Fig. 1, then the global field extensions would have the values shown in Fig. 1.3. Our goal is that, given a bound *k*, when our modified version of

```
(0) NodeCachingLinkedList ()
(7) addFirst (Object )
(8) addLast (Object )
(25) removeFirst ()
```
**Figure 1.4.** A set of sufficient but not minimal builders for NCL

```
(0) NodeCachingLinkedList ()
```

```
(4) add ( int , Object )
```

```
(23) remove ( Object )
```
**Figure 1.5.** Sufficient and minimal builders for NCL with more complex parameters than the ones in Fig. 1.1

Randoop terminates the global field extensions contain all the field values of the bounded exhaustive set of structures with up to *k* nodes, *BE*(*M, k*). The result of the fitness function for the chromosome is the number of field values in the global extensions computed by the tool.

Our rationale for using bounded sets of objects is akin to the small scope hypothesis for bug finding [2]: if one set of methods cannot be used to build small objects that allow to differentiate it from another set of methods, then it is unlikely that these two sets can be distinguished with larger objects. This hypothesis held during our empirical evaluation across all our case studies.

We found that, besides being affected by chance, our tool rarely misses building objects that should add relevant values to the global extensions, when small values for *k* are employed.

**Choosing Better Sets of Builders.** In this section, we propose two ways to improve our evolutionary algorithm by tailoring the fitness function to obtain better sets of builders. This is strongly motivated by the way builders are used to build inputs in program analysis. On the one hand, if we have two sufficient set of builders, the set with the smaller number of methods should always be preferred. In this context, there is no reason to include superfluous methods in builders. For example, the builders in Fig. 1.4 can be used to create the same NCL objects as the builders in Fig. 1.1 of Sect. 2 (both sets are sufficient), but they are not minimal since addLast is superfluous.

On the other hand, builders with more parameters, or more complex ones, are more taxing on program analysis, as they require more effort to be adequately instantiated. Thus, we define a simple criterion of parameter complexity and adapt our fitness to favor builders with simpler parameters over the more complex ones. For example, both sets of builders in Figs. 1.1 and 1.5 are sufficient and minimal (with 3 routines each), but builders in Fig. 1.5 have more parameters that need to be instantiated. Comparing Figs. 1.1 and 1.5 we can observe that addFirst has been replaced by add, which has an additional integer parameter, and that removeFirst was interchanged with remove, which possesses a non-primitive parameter of type Object. Following the criteria explained above, we would like our algorithm to choose the set in Fig. 1.1 over that of Fig. 1.5.

Incorporating these ideas, the fitness function of our approach is defined by:

$$\begin{aligned} f\left(M\right) &= \#fieldExt\left(M\right) + \\ &\left(\frac{w\_1\*\left(1-\frac{\#M}{\#MT}\right) + w\_2\*\left(1-\frac{\left(\#PP(M)+w\_3\*RP(M)\right)}{\left(\#PP(MT)+w\_3\*RP(MT)\right)}\right)}{w\_1+w\_2}\right) \end{aligned}$$

For a chromosome representing a set *M* of methods, drawn from the whole set of available methods of the API, *MT*, the most important part of the fitness for *M*, is the number of values in the field extensions, #*f ieldExt*(*M*), that can be generated using our custom Randoop tool as explained in the previous section. The summand on the right implements the ideas presented in this section. It returns a real value in the interval [0, 1] that is useful to break ties for sets of methods that generate field extensions with the same number of values. In the dividend, the first summand penalizes sets with larger numbers of methods, by computing the quotient of the number of methods in *M* to the number of methods in *MT*, and subtracting the result to 1. Constant *w*<sup>1</sup> (*w*<sup>1</sup> ≥ 1) allows us to increase/decrease the weight of this summand with respect to the other summand. The second summand in the dividend penalizes sets of methods with more complex parameters. Similarly to *w*1, constant *w*<sup>2</sup> (*w*<sup>2</sup> ≥ 1) serves the purpose of increasing/decreasing the weight of this factor in the sum. Notice that we sum up the parameters differently depending on their types: each primitive parameter adds 1 (*P P*(*M*) is the number of primitive parameters in the methods of *M*), and each reference parameter adds a constant *w*<sup>3</sup> (*w*<sup>3</sup> ≥ 1, *RP*(*M*) is the number of reference-typed parameters in the methods of *M*), which allows us to increase the weight of reference parameters with respect to primitive ones. Intuitively, the whole right-hand summand computes the ratio between the number of parameters of *M* (with added weight for reference parameters) to the number of (weighted) parameters for *MT*. The result is then subtracted from 1. Finally, we divide by *w*<sup>1</sup> + *w*<sup>2</sup> to obtain the desired number in the interval [0, 1].

In our experimental assessment we set *w*<sup>1</sup> = 2*, w*<sup>2</sup> = 1*, w*<sup>3</sup> = 2. These values were good enough for our approach to produce sufficient and minimal sets of builders in all our case studies.

It is important to remark that the presented criteria for choosing better builders is based on the kind of program analyses we target (generation of tests cases for parameterized tests, software model checking). New criteria can be defined with other goals in mind, and our approach can be adapted to support them by modifying the fitness function as we did in this section.

#### **4.3 Overall Structure of the Genetic Algorithm**

The previously described elements are the constituting parts of the genetic algorithm implementing our approach. A pseudocode of the genetic algorithm is shown in Algorithm 1. Notice that Algorithm 1 follows the general structure of


**Algorithm 1.** Genetic Algorithm implementing our approach

a genetic algorithm. The initial population is generated by producing all the feasible chromosomes with only one available method (vectors with false in all positions except one, set to true) (line 3). Then, it starts to iteratively evolve the population (lines 4–15). At the beginning of each evolution iteration, the algorithm discards some individuals to control population size, by keeping the *popSize* fittest individuals of the current population and discarding the rest (line 5). Then, the algorithm performs single-point crossover on randomly selected individuals (lines 6–10). Crossover is applied a number of times that is proportional to the population size *popSize*, determined by the product of *popSize* and the crossover rate parameter *cRate* (0 ≤ *cRate* ≤ 1). Then, the algorithm mutates individuals (lines 11–15) by changing the value of each of its genes with probability *mRate* (0 ≤ *mRate* ≤ 1). Any newly created individual by the crossover and mutation operations are added to the population.

The algorithm stops after *numEvo* evolutions, with *numEvo* a parameter of the algorithm. Notice that, we don't have a target value for our fitness, since an untried set of methods might produce a larger number of field extensions than the algorithm has currently seen. Again, there is a compromise to be made for choosing a good value for *numEvo*: a larger number increases the precision of the algorithm but increases its running time, whereas a smaller number makes it run faster but it might not result in the best set of builders.

As usual, we found a number for the parameters of our algorithm that seems to work well in practice. In our experimental evaluation, we set *numEvo* = 20*, popSize* = 30*, cRate* = 0*.*35*, mRate* = 0*.*08 (the last two are the default for the JGap library).

Most of Algorithm 1 is a default evolutionary implementation of the JGap Java library [37]. Notice that, if we take away the complexity of the fitness function, our evolutionary algorithm is rather standard, so it is not surprising that an existing implementation works well for our purposes. Of course, improvements to the evolutionary algorithm, and fine tuning for its parameters (e.g., crossover/mutation rate) might yield faster execution times.

We also implemented a simple multi-threaded version of our approach, that helps improving its performance. Basically, at each iteration we make *t* copies of the current population, where *t* is the number of available threads, and evolve each of the population replicas independently of the others. After all the threads have finished, we keep the 100*/t* fittest individuals of the population evolved by each thread, and use them to build the population for the next iteration of the algorithm.

## **4.4 Reducing the Search Space by Observers Classification**

We say a routine is an observer if it never modifies the parameters it takes, and never generates a non-primitive value as a result of its execution. Column Obs? in Table 1 (Sect. 2) indicates whether each NCL method is an observer or not. Clearly, an observer cannot be used to modify nor build new objects, and therefore can never belong to a minimal set of builders. Hence, if we can classify them correctly beforehand, we can remove the observers from the search to significantly reduce the search space, without losing precision. For example, in the NCL API (Table 1) there are 13 observers out of 34 methods, so by removing observers we prune more than one third of the search space.

To detect observers we run another customized Randoop version before our evolutionary algorithm. This time, we check for each method whether it modifies its inputs at each test sequence generated by Randoop involving the method, by canonicalizing the objects before and after execution of the method, and checking if the field values of the objects change after execution. If this is the case, the method is marked as a builder (not an observer). For return values, if in any test sequence generated by Randoop the method returns a non-primitive value, then we mark it as a builder as well. We run this custom Randoop until it generates a large number of scenarios for each method. Ten to twenty seconds was enough for our case studies. At the end of the Randoop execution, methods not marked as builders are considered observers and discarded before invoking the evolutionary algorithm.

Other approaches exist for the detection of pure methods [15,31] (similar to our observers). Note that our evolutionary algorithm is not dependent on the method classification algorithm, so any of them could be useful for our purposes.

## **5 Experimental Results**

In this section, we experimentally assess our approach. The evaluation is based on a benchmark of data structure implementations, including: NCL from Apache Collections [36]; BinaryTree, BinomialHeap, FibonacciHeap, RedBlackTree taken from [35]; UnionFind, an implementation of disjoint sets taken from JGrapht [38]. We also evaluate our technique on components of real software projects such as Lits from the implementation of Sat4j [3], taken from [20], which consists of a variable store that monitors when a guess was last made about a value of a variable, and whether listeners are watching the state of that variable; and Scheduler, an implementation of a process scheduler taken from [10]. All the experiments were run on 3.4 GHz quad-core Intel Core i7-6700 machines with 8 GB of RAM, running GNU/Linux.

The evaluation consists of two parts. First, we ran our approach (Algorithm 1) on the whole module APIs of the aforementioned classes, to compute sets of builders for each case study. The goal is to assess how good are the builders identified, and the time it takes our approach to compute them. For each case study we ran our approach 5 times. The results are shown in Table 2, including the number of routines in the whole API (#API), a sample of identified builders (some methods might be interchanged in different runs, e.g., addFirst and addLast in NCL), and the average running time (in seconds) of the 5 runs. We manually inspected the results, and found that the automatically identified sets of builders were in all cases sufficient (all the feasible objects for the structure can be constructed using the builders) and minimal (do not contain superfluous methods). The approach is reasonably efficient, taking about 30 min in the worst case.

The second part of the evaluation regards how helpful are the identified builders in the context of a program analysis, namely, the automated generation of test cases. These objects might be used, for example, as inputs in parameterized unit tests. For the case studies that provide mechanisms to measure the size of objects and to compare objects by equality (i.e., the *size* and *equals* methods of data structures), we generated tests with Randoop using all the methods available in the API (API), and then we generated tests with Randoop using only the builder methods (BLD) identified by our approach in the previous experiment (Table 2). We then compare the number of different objects (No. of Objs.), and the size of the largest object (Max Obj. Size) created by the tests generated from the API, against the tests generated using methods from BLD only. We set three different test generation budgets: 60, 120 and 180 seconds (Budget). The results are summarized in Table 3. In addition, we consider another approach, API+, that involves the generation of tests using the API for a budget that encompasses the test generation budget (Budget) plus the time it takes our approach to identify builders for the corresponding case study. The results show that in the same test budget BLD generates in average 1280% more objects than API. Furthermore, when builders identification time is added to the test generation budget for API (API+), BLD can generate 568% more objects in average (w.r.t API+). In all cases, BLD also generates significantly larger objects than API and API+. In view of these results, it is clear that automated builders identification pays off for the automated generation of structures for stateful classes.

The experiments can be reproduced by following the instructions in the paper website [27]. Furthermore, in the site we experimentally show that the builders identified by our approach can be employed to build efficient drivers for software model checking. We don't show these results here due to space constraints.


**Table 2.** Builders computation results

**Table 3.** Assessment of using the identified builders (BLD) vs the whole API (API) in test case generation. API+ involves test case generation with the whole API, with budget = (Budget + builders computation time)


## **6 Related Work**

As mentioned throughout the paper, the problem of identifying sufficient builders is recurrent in various program analyses, including but not limited to software model checking and test generation. In works like [18,23], in the context of software model checking, and [5,24,32,33], in the context of automated test generation, and just to cite a few, the problem of identifying part of an API and provide it for analysis is present. Typically the problem is dealt with manually.

The use of search-based techniques to solve challenging software engineering problems is an increasingly popular strategy, which has been applied successfully to a number of problems, including test input generation [11], program repair [19], and many others. As far as we are aware of, this is a novel application of evolutionary computation in software engineering. An approach that tackles a related, but different, problem, is that associated with the SUSHI tool [5]. The aim with SUSHI is to feed a genetic algorithm with a *path condition*, produced by a symbolic execution engine, so that an input satisfying the provided path condition can be reproduced using a module's API. This approach assumes that the API (or the subset of relevant methods) is provided, as opposed to our work, that precisely tackles the provision of the restricted API.

Our technique requires a mechanism for identifying *observers*, which we have solved within the work in the paper, resorting to random test generation, and instrumentation for state monitoring. Approaches to the identification of observers, or more precisely *pure methods*, exist in the literature [15,31]. Regarding these lines of work, notice that the focus of our evolutionary algorithm is not the identification of observers, but the construction of minimal and sufficient set of builders. Moreover, our approach is in fact independent of the mechanism used to identify observers/pure methods, and thus could be combined with the works just cited (i.e., replacing our random testing based approach by an alternative one).

## **7 Conclusions**

In this work, we presented an evolutionary algorithm for automatically detecting sets of builders from a module's API. We assessed our algorithm over several case studies from the literature, and found that it is capable of precisely identifying sets of builders that are sufficient and minimal, within reasonable running times. To the best of our knowledge, this is the first work that addresses this problem, which is typically dealt with manually.

We also showed preliminary results indicating that our approach can be exploited by test case generation tools to yield larger and more diverse objects. Other techniques, like software model checking, can benefit as well by using the identified set of builders to automatically construct efficient drivers. More experimentation needs to be done, but given the results in this paper our approach looks very promising.

One of the biggest challenges of this work was the construction of a tool to allow us to generate all the bounded structures, for a given maximum number *k* of objects, from the methods of the program API. The proposed solution worked well enough for our case studies, but avoiding randomness in the process would be desirable. Using bounded exhaustive generation tools rather than random generation would better fit our purposes [4], but unfortunately none of the tools for bounded exhaustive test generation produce inputs from a module's API. We believe that a promising research direction, that we plan to further explore in future work, is to adapt our presented approach for bounded exhaustive test generation.

Some aspects of our genetic algorithm can be further improved. For instance, a more powerful classification for argument types, in the prioritization of methods according to their complexities, can be defined. Moreover, one may also incorporate other dimensions, such as *code complexity*, to favor simpler methods. We will explore this direction as future work. Also, our genetic algorithm implementation is, for most parts, a default evolutionary implementation of the JGap Java library [37]. Of course, improvements to the evolutionary algorithm, and fine tuning for its parameters (e.g., crossover/mutation rate) might yield faster execution times, so we plan to investigate this further in future work.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Author Index

Aguirre, Nazareno 427 Arora, Himanshu 228 Bengolea, Valeria S. 427 Beyer, Dirk 389 Bezirgiannis, Nikolaos 332 Boronat, Artur 134 Bravetti, Mario 351 Chechik, Marsha 3 Chen, Xiaohong 61 Cleophas, Loek 25 de Boer, Frank 332 Diab, Moustapha 264 Dimovski, Aleksandar S. 192 Diskin, Zinovy 264 Dubrulle, Paul 369 Dumas, Marlon 306 Emre, Mehmet 247 Eniser, Hasan Ferit 171 Frias, Marcelo F. 427 Fritsche, Lars 116 García-Bañuelos, Luciano 306 Gaston, Christophe 369 Gerasimou, Simos 171 Gharachorlu, Golnaz 409 Giallorenzo, Saverio 351 Giese, Holger 282 Hardekopf, Ben 247 Hennicker, Rolf 79 Huang, Li 210 Jakobs, Marie-Christine 389 Johnsen, Einar Broch 332 Jordan, Alexander 43 Kang, Eun-Young 210 Knapp, Alexander 79

Kokaly, Sahar 3 Komondoor, Raghavan 228 Kosiol, Jens 116 Kosmatov, Nikolai 369 Kourie, Derrick 25 Lambers, Leen 151 Lapitre, Arnault 369 Laud, Peeter 306 Lawford, Mark 264 Legay, Axel 192 Louise, Stéphane 369 Madeira, Alexandre 79 Mallet, Frédéric 61 Matulevičius, Raimundas 306 Mauro, Jacopo 351 Maximova, Maria 282 Milo, Curtis 264 Naujokat, Stefan 101 Nichols, Lawton 247 Ogata, Kazuhiro 299 Orejas, Fernando 151 Pankova, Alisa 306 Pantelic, Vera 264 Park, Joonyoung 43 Peng, Chao 315 Pettai, Martin 306 Politano, Mariano 427 Ponzio, Pablo 427 Pullonen, Pille 306 Pun, Ka I 332 Qian, Jiaqi 299 Rahimi, Mona 3 Rajan, Ajitha 315 Ramalingam, G. 228 Runge, Tobias 25 Ryu, Sukyoung 43

Sakizloglou, Lucas 282 Salay, Rick 3 Schaefer, Ina 25 Schneider, Sven 151, 282 Schürr, Andy 116 Selim, Gehan 264 Sen, Alper 171 Song, Fu 61 Steffen, Bernhard 101 Sumner, Nick 409

Taentzer, Gabriele 116 Talevi, Iacopo 351 Tapia Tarifa, S. Lizeth 332 Thüm, Thomas 25 Tom, Jake 306

Toots, Aivo 306 Tuuling, Reedik 306 Viger, Torin 3 Wang, Yi 299 Wasowski, Andrzej 192 Watson, Bruce W. 25 Weslati, Feisel 264 Wynn-Williams, Stephen 264 Yerokhin, Maksym 306 Zavattaro, Gianluigi 351 Zhang, Min 61, 299 Zweihoff, Philip 101